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Preface 


The purpose of these notes is to provide a practical introduction to forward-error- 
correcting coding principles. The document is somewhere between a review and a how-to 
handbook. Emphasis is on terms, definitions, and basic calculations that should prove useful 
to the engineer seeking a quick look at the area. To this end, 41 example problems are 
completely worked out. A glossary appears at the end, as well as an appendix concerning 
the Q function. The motivation for this document is as follows: The basic concepts of 
coding can be found in textbooks devoted to communications principles or in those dealing 
exclusively with coding. Although each is admirable in its intent, no elementary treatment, 
useful for quick calculations on the job, exists. I have taken a short course on coding, given 
by Prof. E.J. Weldon, Jr., as well as one given in-house at NASA Lewis. These notes are for 
those who have not had the time either to take such courses or to study the literature in some 
detail. 

The material included is primarily that found in basic textbooks and short courses. The 
reader should not anticipate developing sufficient skills to actually design a code for a 
specific purpose. Rather, the reader should be far enough along the learning curve to be able 
to read and understand the technical literature (e.g., IEEE Transactions on Information 
Theory). The topics I chose to discuss here were those that almost always cropped up in the 
references and apparently are the ones the beginner should learn. The emphasis is on 
definitions, concepts, and analytical measures of performance whenever possible. 

The “questions of coding” from an engineer's viewpoint may be stated as, Should coding 
be used? And if so, which code? What performance improvement can be expected? A basic 
measure of performance is the coding gain, but establishing an accurate formula is not a 
trivial exercise. Here, I summarize the essential process to determine approximate values. 
Some software packages are now available to permit simulations, but they are more 
appropriate for true experts on coding. 

Here, I consider “coding” to be only forward error correcting (FEC), as opposed to other 
uses of the term, which are source coding, encryption, spreading, etc. In practice, code 
performance is modulation dependent; thus, the code should be matched to both the 
channel's characteristics and the demodulator's properties. This matching is seldom, if ever, 
done. Usually, some standard, well-established code is used, and its appropriateness is 
determined by the closeness of the bit error rate to system specifications. 

A goal of these notes is to present an orderly development, with enough examples to 
provide some intuition. Chapter 1 reviews information theory and defines the terms “self, 
mutual, and transmitted information.” Chapter 2 reviews channel transfer concepts. 
Chapter 3 treats modulo-2 arithmetic and channel terminology. Chapter 4 gives an overview 
of block coding. Chapter 5 goes deeper into block codes, their performance, and some 
decoder strategies and attempts to cover finite field algebra, so that the beginner can start 
reading the literature. A code may be looked upon as a finite set of elements that are 
processed by shift registers. The properties of such registers, along with those of the code 



elements are used to produce coders and decoders (codecs). The mathematics of finite 
fields, often referred to as “modem algebra” is the working analysis tool in the area, but 
most engineers are not well grounded in its concepts. Chapter 6 introduces convolutional 
coders, and chapter 7 covers decoding of convolutional codes. Viterbi and sequential 
decoding strategies are treated. 

No attempt at originality is stated or implied; the examples are blends of basic problems 
found in the references and in course notes from various short courses. Some are solutions 
of chapter problems that seemed to shed light on basic points. Any and all errors and 
incorrect “opinions” expressed are my own, and I would appreciate the reader alerting me of 
them. 
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Chapter 1 

Information Theory 

Both this and the following chapter discuss information, its measure, and its transmission through a 
communications channel. Information theory gives a quantitative measure of the “information content” in a 
given message, which is defined as the ordering of letters and spaces on a page. The intuitive properties of 
information are as follows: 

1. A message with more information should occur less often than one with less information. 

2. The more “uncertainty” contained in a message, the greater the information carried by that message. For 
example, the phrase “we are in a hurricane” carries more information than “the wind is 10 mph from the 
southwest.” 

3. The information of unrelated events, taken as a single event, should equal the sum of the information of 
the unrelated events. 

These intuitive concepts of “information” force the mathematical definitions in this chapter. Properties 1 and 
2 imply probability concepts, and these along with the last property imply a logarithmic functional 
relationship. In other words, the amount of information should be proportional to the message length, and it 
should increase appropriately with the richness of the alphabet in which it is encoded. The more symbols in the 
alphabet, the greater the number of different messages of length n that can be written in it. 

The notion of self-information is introduced with two examples. 

Example 1.1 

Assume a 26-character alphabet and that each character occurs with the same frequency (equally likely). 
Assume m characters per page, and let each page comprise a single message. Then, the total number of 
possible messages on a given page is determined as follows: Let the position of each character be called a slot; 
then, 

1. First slot can be filled in 26 ways. 

2. Second slot can be filled in 26 ways, etc. 

Because there are m slots per page, there are (26)(26)...(26), that is, m terms and 26 m possible arrangements. 
(In general, the number of permutations N of k alphabetic symbols, taken n at a time, is 


N= tf 1 


and each permutation is considered a message.) 

Define each arrangement as a message. The number of possible messages on two pages is 26 2m . Now by 
intuition assume that two pages will carry twice as much information as does one page. Taking logarithms of 
the possible arrangements yields 
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log[ 

26 2m 

] ^ information on 2 pages 

log 

26 m 

information on 1 page 


Thus, the log of the total number of available messages seems to make some sense. (The end of an example 
will henceforth be designated with a triangle A.) 

Before moving on, I must discuss pulses, binary digits, and symbols. In general, a source of information 
(e.g., the digital modulator output) will emit strings of pulses. These may have any number of amplitudes, but 
in most cases only two amplitudes are used (thus, binary pulses). The two amplitudes are represented 
mathematically by the digits 0 and 1 and are called binary digits. Thus, electrical pulses and binary digits 
become synonymous in this area. Often, groups of binary digits are processed together in a system, and these 
groups are called symbols. 

Definitions 

Binit — A binary digit, 0 or 1. Also called a bit. 

Baud — The unit of signaling speed, quite often the number of symbols transmitted per second. Note that 
although baud is a rate, quite often the words “baud rate” are given, so that the meaning is basically vague. The 
speed in bauds is equal to the number of “signaling elements” sent per second. These signaling elements may 
or may not be groups of binary digits (someone could mean amplitudes of sine waves, etc.). Therefore, a more 
general definition is the number of signal events per second. Baud is also given a time interval meaning; it is 
the time interval between modulation envelope changes. Also, one finds the phrase “the duration of a channel 
symbol ” 

Example 1.2 

Consider a source emitting symbols at a rate of 1 IT symbols per second. Assume that m distinct message 
symbols are available, denoted by and together are represented by For simplicity, at this point, 

assume that each symbol can occur with the same probability. The transmission of any single symbol will 
represent a certain quantity of information (call it /). Because all symbols are equally likely, it seems 
reasonable that all carry the same amount of information. Assume that / depends on m in some way. 


/ = f(m) ( a ) 

where/ is to be determined. If a second symbol, independent of the first one, is sent in a succeeding interval, 
another quantity of information / is received. Assume that the information provided by both is / + / = 27. Now, 
if there are m alternatives in one interval, there are ni ^ alternative pairs in both intervals (taken as a single event 
in the time 27). Thus, 

2 I = f(m 2 ) ( b ) 


In general, for k intervals 

kl = f(m k ) ( c > 


The simplest function to satisfy equation (c) is log; thus, 

/(/n) = A logm 
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where A is a constant of proportionality and the base of the log is immaterial. The common convention is to 
define the self-information of an m-symbol, equally likely source as 

/ = log 2 m bits (d) 

when the base is chosen as 2. Observe that the unit for information measure is bits. The value of equation 
(d) is the quantitative measure of information content in any one of the m symbols that may be emitted. 


Observe in example 1 .2 that 


1 = log 2 m = — log 2 f~J = -^og 2 (Pj) 0-1) 

The probability of any symbol occurring, p, = 1/m, is used to generalize to the case where each message 
symbol x, has a specified probability of occurrence p r 

Definition 

Let Xi occur with probability then, the self-information contained in x, is 


/(*;)£ -log 2 p{xi) i = (1.2) 

Next, the average amount of information in any given symbol is found for the entire ensemble of m available 
messages. 

Definition 


( l i x i))='£p( x i) I ( x i) = H ( x ) 

i=i 


(1-3) 


where H(x) is the average self-information or self-entropy in any given message (symbol). It is also called the 
entropy function. The average self-entropy in x, can also be defined as 

«(*,) = />(*,)/(*,) (L4) 


Finally, 


m 

H(x)§ i)log 2 p{xi) bits/symbol (1.5) 

i= l 

or in briefer notation 


m 

H{x) = -£p{ i) logp(0 

i-\ 


( 1 . 6 ) 
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Observe for the special case of equally likely events, p{xp = 1/m, 


/(*;) = lo g 2 /n 
m 

H(x) = 2_, — l°g 2 m = l°g 2 m 

^ m 


or 


H{x) = l[x i ) (1.7) 

The logarithmic variation satisfies property 3 as follows: The term “unrelated events” means independence 
between events. For events a and /J, the joint probability is 

p(a f)p) = p(a,p) = p(aP) 

(these notations are found in the literature). Then, 


p{a, P) = p{a\P)p{P) = p{a)p{P) (l - 8 ) 

where the second equality means p(cr|)3) = p{oc), which defines independence between cc and p. Hence, if cc 
and p are independent, 

I(a fl /?) = /(a, P) = - log p(a, /?) = - log [p(a)p(p)\ = - log p(a)- log p(P) = /(a) + /(/3) 

or the information in both events, /(ct;/3), is the sum of the information contained in each. 

Notation in this area is varied and one must become accustomed to the various forms. Thus, in the literature 
either capital P or p is used for probabilities, probability densities, or probability distributions. The meaning is 
clear in all cases. Recall that in probability theory the words “density” and “distribution” are used 
interchangeably and one must adjust accordingly. In this document, the notation is as consistent as possible. 
Observe carefully in the preceding discussion the interplay between self-information, average information over 
the ensemble, and average information about a specific symbol. Coupling this with several binary digits per 
symbol and noting that the units for self-information are bits gives a rich mixture for endless confusion. Also, 
the special case for equally likely events is often used in examples in the literature, and many of this case s 
results are, of course, not true in general. 

Aside 

The relationship to thermodynamics is as follows: First, recall the evolution of the entropy concept. The 
change in entropy in a system moving between two different equilibrium states is 

reversible 

where 52 - *S| is the entropy change, £Q is the change in heat (positive if moving into the system), and T is the 
temperature at which it is exchanged with the surroundings. The slash through the symbol for the change in Q 
alerts the reader that heat ( Q ) is not a perfect differential. The constraint “reversible” means that the change 
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from state 1 to state 2 occurs over a sequence (path) of intermediate equilibrium states. A “reversible path” 
means no turbulence, etc., in the gas. In general. 


wj* f < uo > 

and the equality only occurs for reversible (physically impossible, ideal situations) changes. Later, another 
definition arose from statistical thermodynamics, that is, 

S = k\nW (1.11) 

which is apparently an absolute measure (not just a change). Here, k is Boltzmann’s constant and W is the 
“thermodynamic probability” of the state of interest. Unlike normal probabilities, W is always greater than 1. 
It represents the number of microscopic different arrangements of molecules that yield the same macroscopic 
(measurable quantities are identical) state. The calculation of W starts from first principles. From the theory, the 
equilibrium state of a system has the largest W and hence the maximum entropy. Another concept from 
statistical thermodynamics is the distribution function of a system/. It is defined by 

dN = /(x, y t z, v x , v y , v v tjdx dy dz dv x dv y dv z = f(r, v, t)dr dv 

which means the number of particles at point (j t,y,z) with velocity components (v*, Vy, v z ) at time t. Note that 
/ is a particle density function. 


number of particles = dN dN 

vol (real space) vol (velocity space) drdv 

Then, Boltzmann’s H theorem states that 


( 1 . 12 ) 


H A jjf\nfdrdv 


(1.13) 


and he showed that 


= -(constant) 5 classical 

where ^classical is the classical entropy of thermodynamics. Basically, this says that the measured entropy is the 
average over the distribution functions available to the system. The reason for the log variation in equation 
(1.11) is as follows: Assume that the entropy of a system in a given state is some function g of the 
thermodynamic probability of being in that state, that is, 


S A =kg{W A ) 

where the subscript A denotes the state of interest and is known by some method. If a similar system is in 
state B , 


S B =kg{W B ) 

From experiments, it was known that if the systems were mixed (combined), the resulting entropy Sab was 
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Therefore, 


S AB ~ S A +S B 


$AB “ ^s{^Ab) 


But if is the number of arrangements of the combined system, then from counting rules 


A little reflection shows that a possible choice for g is log: 

S AB = k ln(W AB ) = k In ( W A W B ) = k ln(VV A ) + k ln(W a ) = S A + S B 

From this, the logarithmic variation was bom. (End of aside.) 

Observe that equations (1.5) and (1.13) are similar in form; Shannon (1948) mentions this in his paper. For 
this reason, he chose the symbol H and the name entropy for the average information. The link with 
thermodynamics can be established as follows: Consider a container of gas with all molecules m one comer. 
Because in this condition the “uncertainty” in the position of any molecule is small, let Wi represent the 
thermodynamic probability of this condition (lVj > 1, by definition). For this particular case W x = 1 , since only 
one microscopic arrangement makes up this state. Recall that the gas molecules are dimensionless points, so 
that permutations at a specific point are not possible. Because the probability that all molecules are in one 
comer is very small, this is a rare event and has very low ^classical When in equilibrium any single molecule 
can be anywhere in the container and the uncertainty in its position is large, the thermodynamic probability is 
W 2 > Wi . Thus, the entropy (in a thermodynamic sense) has increased. When in equilibrium any single 
molecule can be anywhere in the container and the uncertainty in its position is larger than in the previous case, 
the Sdassical is much larger and the entropy (in a thermodynamic sense) has again increased. With information, 
low probability of occurrence gives large self-information; the probability here is always less than 1. In other 
words, W and normal probability are reciprocally related, so that uncertainty is the common thread. Thus, 
average information, not information, and classical thermodynamic entropy vary similarly (where uncertainty 
is the common thread). This similarity occurs only because of the intuitive constraints imposed on / and H at 
the beginning of this chapter. Mathematically, both S and H are defined by density functions /and p, 
respectively; 


■^classical = -(constant) j J / In / dfdv 
m 

H W = -XK jc «) 1 °8^( x /) 

1*0 

As a final remark, note that thermodynamic entropy increases and decreases as does/, which varies with the 
number of states available to the system. As the boundary conditions (pressure, volume, temperature, etc.) 
change so does the number of available states. After the number of states has been determined, one must also 
find the distribution of particles among them. With/now found, 5 classical is found by its formula. Therefore, 
entropy, as we all know, is not an intuitive concept. 

Example X ,3 

Consider a source that produces symbols consisting of eight pulses. Treat each symbol as a separate 
message. Each pulse can have one of four possible amplitudes, and each message occurs with the same 
frequency. Calculate the information contained in any single message. 
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The self-information is 


o 

number of messages = 4 


I(x i) = log 2 (4 8 ) = 16 bits/message 


The entropy in any message is 

H(x) = log 2 (4 8 ) = 16 bits/message 


Here, /(*,) and H(x) are equal, since all messages are equally likely. ^ 

Now, I introduce some alternative units for information. If the base of the log is 2, the unit is bits. If the base 
is 10, the unit is hartleys. For natural logs (In), the unit is nats or nits. 

Example 1.4 

Consider the English language to consist of 27 symbols (26 letters and 1 space). If each occurs at the same 
frequency, 

N , 

H = ^ — log 2 (27) = 4.76 bits/symbol 

i^i 27 

The actual frequency of occurrence yields H = 4,065 bits/symbol. 


A key property of H(x) is 

H(x) is a maximum when p( x\) = pfe) = ... = pfo) 
That is, all symbols occur with the same frequency, 


W«| max =log 2 ^ 

where N is the total number of equally likely messages. 

Example 1.5 

Show that H(x) is a maximum when all /?(*,) are equal. 

N 

H = - j p t log p t = ~(p { log p ] + p 2 log p 2 + ... + p N log p N ) 
i= 1 


Observe that for any term 


d(plogp) = 


1 ^ 

p— + \ogp \dp = (l + log p)dp 
P ) 
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Then, 


dH = -[dpi (l + log pi ) + dp 2 (l + log pi ) + . . . + dp N ( 1 + log pn )] 


Because 


Pi + Pi + • • + Pn = 1 


we have 


dpi + dp 2 + . . . + dp yy — 0 


(a) 


By using equation (a), dp ^ can be eliminated 

dH = -\dp\ log p ] +dp 2 \ogp 2 + -+ dp N log p N ] 

= -[dp ] log p\ +dp 2 log p 2 +... + {-dp\ -dp 2 -...-dp N _i)\ogp N ] 


Combining terms gives 


-dH = 


dp x log 


r El 

\pn 


+ dp 2 loj 



/ 

+ ... + dp N . ] log 

V 



(b) 


Observe in equation (b) that dp \ , dp 2 , ...» dpw-\ are now completely arbitrary, since the constraint in 
equation (a) has essentially been removed. In other words, dp^j has been removed in equation (b). Inspection 
shows that H is concave down f|, so that at the maximum dH = 0 and equation (b) gives 


log JL = 1 o g -P2- = ... = log^ = 0 
Pn Pn Pn 


because the dp\, dp 2 , dp^-\ values are now arbitrary. Then, 

Pi _ Pi_ _ _ Pn-\ _ j ( C ) 

Pn Pn Pn 


or 


Pi =P2 = — = Pn-\ kP 

Note that because 

N - 1 

pn = i-Ip= 1 -(^-Dp 

1*1 
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any term in equation (c) is 


1 -{N-l)p 


or rearrange to find 


P = 


N 


(d) 


▲ 

This chapter defined the term “message” and introduced the intuitive constraints applied to the measure of 
information. Then, it showed the utility of the log of the number of permutations, and covered the blending of 
pulse, binary digit, and symbol used in information theory. Bit and baud were discussed, the term “self- 
information” was introduced, and the term “average information” (or entropy) was defined. After alluding to 
notational variations, the chapter discussed the links between information theory and classical thermodynamic 
entropy. The last example showed H(x) to be a maximum for equally likely outcomes. 
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Chapter 2 

Channel Transfer 


This chapter considers a discrete memoryless source (DMS) transmitting symbols (groups of binary digits) 
over a memoryless channel (fig. 2.1). The source emits symbols x that are impressed onto the transmitted 
waveform u, which then traverses the channel medium. The received waveform v is then demodulated, and the 
received sequence is denoted by y. How closely y matches x yields a measure of the fidelity of the channel. 
The word “channel” is loosely defined, in that it may include portions of modulators, demodulators, decoders, 
etc. In general, it means some portion between the source and sink of the communicating parties. The fidelity 
of the channel is represented as either a channel transition probability matrix or a channel transition diagram 
(fig. 2.2). In this figure, the term j means the conditional probability that y t - is received in the ith time 

slot, given that jc; was transmitted in that slot (with the delay in the system appropriately considered). In 
principle, these entries are determined by measurement on a given channel. Because of the property of 
probabilities for exhaustive events, the sum over any row must be unity. It follows that a particular output, say 
y n> is obtained with probability 



M 

^P(yn\ x m}p{ x m) 


m=\ 


( 2 . 1 ) 


where p(x m ) is the probability that x m was input to the channel. The entropy of the channel output is 


N 

H {y) = -'£,p{yn) ] °g2P{yn) bits/symbol 

n=l 


( 2 . 2 ) 


and the entropy of the output, given that a particular input, say x ^ was present, is 

N 

P{ynVm) 

n=l 

When averaged over all possible inputs, the conditional entropy of the output given the input H (y)x) is 

M N 

w (yW = “X X p (*™ ,> «) log2P (>’'‘M bits/symbol (2.3) 

m-\ n = 1 
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x = {xi* 2 ,...s m } z*ty-\.y2--<yJ 

= {*,} — -/ ={//) — -i 

Figure 2.1 .— ! Basic communications channel. (The source emits 
symbols x,- (with shortened notation /), and at the receiver the 
symbol y- } appears. The difference between x, and y ; is the 
corruption added by the channel.) 


Inputs Outputs 



p(y | Xl ) P(yn lx l) 

5 \ I 

i \ i 

[p(y ixjj- | \ I 

(b) [p(yi ix m ) P(yn ,x m) 

Figure 2.2.— Channel transition diagram (a) and alternative 
representation of channel transition matrix (b). 


where the relationship 


P{ x m^n) = P(yn\ x m)p{ x m) 

has been used for the probability that the joint event that the input was x m and the output was y„ has occurred. 
In a similar fashion, the conditional entropy H(x |y) can be defined by replacing p(>'nFm) b y P{ x m\yn) in 

equation (2.3). . . 

Recall that entropy is the average amount of information; therefore, H[x\y) is the average information about 
x (channel input) given the observation of y at the receiver. This knowledge is arrived at after averaging over 
all possible inputs and outputs. Because H(x) is the entropy for the input symbol with no side information (not 
knowing the channel output), it follows that the average information transferred through the channel is 


I(x-,y) = H(x)-H{x\y) bits/symbol (2-4) 

where l{x',y ) is defined as the mutual information. By Bayes theorem 


/(*;y) = //(y)-tf(y|*) 


bits /symbol 


(2.5) 




In either case, I(x;y) can be written as 


M N 


m = 1 n=l 


?n) log 2 


A 

p{ x m)p{y n ) 


bits /symbol 


( 2 . 6 ) 


where p{x m ,y n ) = p(y n \x m )p(x m ) = p(x m \y n )p{y n ) are the joint probabilities of the event that the channel 
input is x m and its output is y n . 

By the theorem of total probability, the mutual information can be expressed as a function of the channel 
input probabilities p(x m ) and the channel transition probabilities F° r a specified channel, the 

transition terms are fixed and are presumed to be determined by experiment With mutual information defined, 
the maximum, which Shannon (1948) defined as the capacity of a channel, is defined as 

C = max/(jc;y) 

P(xm) 

The channel capacity C is the maximum amount of information that can be conveyed through the channel 
without error if the source is matched to the channel in the sense that its output symbols occur with the proper 
probabilities such that the maximum mutual information is achieved. The p{x m ) under the “max” in the 
preceding equation means that the source is appropriately adjusted to achieve the maximum. The alteration of 
the probabilities of the source’s output symbols p{x m ) to maximize the probability of successful (error free) 
transmission is assumed to occur by appropriate coding of the raw source output symbols. The early thrust in 
coding theory was to search for such optimum codes. 

Although developed for a discrete channel (finite number of inputs and outputs), I(x\y) can be generalized 
to channels where the inputs and outputs take on a continuum of values (the extreme of “soft” modulators and 
demodulators). 

An alternative approach to redeveloping equations (2.1) to (2.6) is to start with the reasonable definition of 
joint entropy H(x y y) QciN-M = n for simplicity): 


H(x,y 


-X X A x > ' y j ) log A x ‘’ y j) = ~X I>w) iog K'-/) 

i=l j=\ i j 


If Xi and y, are independent 


p(xj >?;)= p{xi )p{y'j ) = P{i)p(j) 


Then, 


H{x,y) = X, P^P(j ) log [/’(OK/)] 

i J 

= ~x, p ( i ) iog pi oX K/)-X log H/)X ^ = H i x ) + H i y ) 

i J J > 

If there is some dependence 


Piij) = p(‘\])p(j) = piAijpii) 


then, 
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H(x, y) = p{i) log /KOX, M = ~X X P(Mrf ) lo ® pO 1‘) = H ( x ) + 


tf(y|*) = “X X -*) ,og pOIO 

' j 

is the conditional entropy. It is also called the equivocation of x about y or the equivocation of y given x. It can 
be shown that 

H{x, y) = H(x) + H(y\x) = H{y) + H(x\y) 

Then, the mutual information is defined by 


I{x;y) £ H(x)—H(x\y} = H(y)-H(y\x) 


= XX p ( Jri ’ :> '>) log 

i j 


p( x i)p{yj) 


I{y;x) 


In my opinion, the key to enabling the subtraction of the equivocation from the self-entropy is just the additive 
property of entropy by its basic definition. Many variations on this theme are found in the literature; mutual 
information is sometimes called delivered entropy. Also, there are more axiomatic and perhaps more 
mathematically rigorous presentations, but I think that the above essentially covers the basic idea. The Venn 
type of diagram shown in figure 2.3 is sometimes used, and it can be helpful when following certain 
presentations. 



Figure 2.3. — Venn diagram for various entropy terms and their 
relationship with mutual term /(*; y). 
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For computational purposes, the relationships between logs are 


log 2 x = 


fogio* 
l°glO 2 


= 3.321928 log 10 a: = 1.442695 In* 


Example 2.1 

The classic binary symmetric channel (BSC) with raw error (or crossover) probability p serves as an easy 
demonstration of the procedures discussed above. The two symbols x\ and *2 have frequencies of occurrence 
such that 


p( x i) = « 

p(x 2 ) = l-a = P 



and the channel transition matrix is 


P = 


q 

p 


p 


pM*i) pfeh) 

p(y\\ x 2) pfeM 


The final objective is to determine the capacity, and the sequence of steps to find it are as follows: First, the 
entropy of the source symbols x\ f x*i is 

H(x) = -a log a-( 1 -a) log (1 -a) 

Then, from the definition of conditional entropy, 

m n 

= ~X XH^MpfoM] 

i= 1 7=1 

and using p[x t ,> 7 ) = p(y ; |x, ) jp{x i ), 

2 2 

H {y\ x ) = ~X ^p( x i)p{yj\ x i) ] °g pfoki) 

i=i j = i 
2 

= _ X p( x >)p{yi\ x i) lQ g p(yih) + p( jc <)p(y2k) i°g p(y2k) 

i=l 
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= -{ptaMyita) 10 * lo s piyiV 1) 

+p{*2 )p{yi \ x z ) lQ g p{yi \ x 2 ) + p( x 2 )p{y2 V2 ) lQ g p{y 2 V2 )} 

= -{ocq log q + ap log p + Pp\ogp + Pq log q } 

= -{(a + p)q log q + (or + P)p log p) 

= ~p log p-q log q 

hi P) 


Next, find H(y): 


2 

H{y) = -^p(yj) log p[ yj ) 


;'= 1 


Now, 

p{>\ ) = p(y \ h )p{ x i ) + p(y i\ x 2 )p{ x 2 )=Q a+ pP 
p{y 2 ) - p(y 2 \ x \)p{ x \) + W^) = p<*+<iP 

Then, 


H{y) = -(<?“ + pP) log (<?a + pP)-{pa + qP) log (pa + qP) 
Then, the mutual information is 


/(x;y) = //(y)-H(y|x) = H 2 {qa + pp)-H 2 (p) 
where H 2 (w) is the entropy function for a binary source, 


H 2 {u) £ -u \°g u -(1 -u) log(l - k) 

Figure 2.4, a sketch of H 2 (u), shows that H 2 ( 0.5) = // 2 max has a maximum of unity; thus, the channel capacity 
is 

C = l-// 2 (p) = l + /> log /> + (l-/?) log(l -p) bits/ symbol 
where a = /3 = 1/2 by observation of the plot. Then finally, 
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u 


Figure 2.4. — Entropy function for binary source. 


C = Cbsc 

C BS c = 1 + P l°g P + (1 -p) log(l —p) bits / symbol 


▲ 

Figure 2.5 gives channel capacity C versus the crossover probability p. The capacity can be given in various 
units; namely bits per symbol, bits per binit (binit means binary digit), or bits per second. For example, if 
p = 0.3, then C = 0.278 bit/symbol, which is the maximum entropy each symbol can carry. If the channel were 
perfect, each symbol could carry the self-entropy H{x), which is calculated by the size of the source’s alphabet 
and the probability of each symbol occurring. The 30-percent chance of error induced by the channel can be 
corrected by some suitable code, but the redundancy of the code forces each symbol to carry only 0.278 bit. 
Another interpretation of C follows by assuming that each transmitted symbol carries 1 bit. Then, C is the 
remaining information per symbol at the receiver. When p = 0.3, each received symbol carries only 0.278 bit. 
This rather drastic loss of information (72.2 percent) for p = 0.3 occurs because, although only 30 percent are 
in error, the receiver has no clue as to which ones. Thus, the code to tell the receiver which symbols are in error 
takes up a large amount of overhead. In the original development of the theory, the symbols, which are 
composed of binary digits, were assumed to be mapped by the modulator into some specific analog waveform 
to be transmitted over the channel. If the received waveform were demodulated in error, the number of actual 
binary digits in error could not be determined. Thus, errors are basically message errors, and the conversion 
from message error to binary digit error is always vague. 

Finally, consider the case for a continuous source (one that emits analog waveforms). The definition for the 
entropy is as before, with the summation going to the integral: 

H = -\p(x)lo g [p(x)}dx 
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Figure 2.5.— Channel capacity C versus crossover 
probability p for binary symmetric channel. 


The problem is to determine the form for p(x) that maximizes the entropy, under the constraint that the average 
power is fixed (i.e., the variance is fixed for voltage waveforms). This constraint is 

f x 2 p(x)dx = a 2 


The basic constraint for probability densities is 

f p{x)dx = \ 

J — OQ 


which forces p(x) to be Gaussian: 


, \ 1 - x 1 12a 1 


where o 2 is the average signal power. Evaluating the integral for H gives the maximum entropy for an analog 
source, 



The classic formula for channel capacity is arrived at by considering a theoretical code with signal power S. 
The entropy of the code is 


H s =^\og(2neS) 
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If the channel is the classical additive white Gaussian noise (AWGN), no fading or intersymbol interference 
(ISI) allowed, the noise entropy is 


» N =\ \og{2neN) 

where N is the average noise power. The capacity is thus 


c^h s+n 



Now, the maximum information rate R m ^ is 



where 


T = 


1 

2W 


(W= bandwidth) is the Nyquist rate for no ISI. Then, 


1 . (i S 

-i° g i+— 
u v N 

max i 

2 W 



where N = N 0 W. Here, the noise power spectral density N 0 is in watts per hertz (single sided), and again the 
constant average signal power is S. If 



where R = kin is the code rate, then 


C = 



r E ' 
1 + 2 R-k- 


bits / sec 


Here, k is the number of actual information symbols emitted by the source, and the encoder takes them and 
outputs n total symbols (adds the appropriate coding baggage). 

This classic capacity formula differed from the general opinions of the day in the following ways: 
Apparently, it was thought that the noise level of the channel limited the maximum information that could be 
transmitted. Shannon’s (1948) formula shows that the rate of information transfer is actually bounded and is 
related to the basic parameters of signal power 5, bandwidth W , and noise power spectral density N 0 . Another 
measure of the upper limit for information transmission is called the cutoff rate. It is less than C and arises as 
follows. 
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The capacity as defined earlier is the absolute upper limit for error-free transmission. However, the length of 
the code and the time to decode the symbols to extract the desired message may be prohibitively long. The 
cutoff rate serves as more of an implementation limit for practical decoders. It turns out that the computations 
required to decode one information bit for a sequential decoder has asymptotically a Pareto distribution, that 

is. 


P(comp > N)< [iN a N »l 


where “comp" means the number of computations and N is some large chosen number. The coefficient a is the 
Pareto exponent, and it along with /3 (another constant) depend on the channel transition probabilities and the 
code rate R. This relationship was found by Gallager (1968) and verified through simulation. The code rate and 
the exponent are related by 


R = 


E 0 (a) 

a 


where 



l l 

l+a' 

E 0 {a) = a-\ og 2 - 

(1 — p) ,+a + p i+a 



_ 



is the Gallager function for the BSC. The solution when a= 1 yields R A fy, the computational cutoff rate. In 
general, systems use 1 < a < 2. The value R 0 sets the upper limit on the code rate. For the binary input/ 
continuous output (very soft) case, 


/? 0 = l-log 2 (l + e R£ '’ /Af ‘’) 


and for the discrete memoryless channel/binary symmetric channel case (DMC/BSC) 


^0 = 1- log 2 |l + 2^/p(l —p) 


Then, the probability of a bit error is 


2 - KRo/R 

Pbi ‘ 2 jj.j-l*./*-'! 

where K is the constraint length of the encoder. The terms “sequential code” and “constraint length” will be 
defined in chapters 6 and 7. 

Other variations on cutoff rate can be found. However, they often are involved with channel coding 
theorems, etc., and most likely the discussion deals with upper bounds on message error rates. Two such 
expressions are 



P(message error) <C R 2 nR ° R<Ro 

P ( message error) < C R 2 KR ° R < Rq 
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The leading coefficients Cr are determined experimentally and depend on the channel and the code rate. The 
exponent n is the block length for a block code, whereas K is the constraint length for a convolutional code. 

Example 2.2 

What are the self-information and average information values in a coin-flipping experiment? Here, the 
symbols are heads and tails. Then, self-information is 

/(head) = -log 2 ^jj - 1 bit 


and the average information is 


H{ symbol) = j Iog 2 (£|-log 2 (£j {:[ biw symbol 
head tail 

so the entropy is 1 bit/symbol or 1 bit/flip. Other units, such as nits per flip or hartleys per flip, could also be 
used by changing the base of the log. 

▲ 


Example 2.3 

This is an interesting exercise on “units.’’ Consider transmitting the base 10 digits 0,1, 2,. ..,9 by using a code 
consisting of four binary digits. The code table is 

0 0 0 0 0 

1 0 0 0 1 

2 0 0 1 0 


15 1111 

Note that decimals 10 to 15 (corresponding to 1010 to 1111) never appear. The total number of symbols 
N is 10 for this channel. Now, the self-information per symbol (assuming that all are equally likely) is 

/(*;) = log 2 10 bits 

Then, forming the ratio of the number of information bits transmitted per binary digit gives 

— .JmiP.0.83-^-. or 0.83 — 

binary digit 4 binit bit 

Here, binit stands for binary digit. Quite often, binit is shortened to “bit,” which gives the rather confusing unit 
of “bit per bit” Here, each binary digit carries only 0.83 “information bit” (or self-information) because only 
10 of the possible 16 sequences are used. The value 0.83 is further reduced by propagation over the channel 
after being acted upon by the channel transition probabilities. 

Similarly, the capacity for the binary symmetric channel can be written as 


C = 1 + p log 2 p + ( 1 “p)log2(l“/0 bits /binit or bits /symbol 



where the latter units are information bits per symbol. The capacity can also be given as a rate as was done for 
the AWGN channel: 


C = W log 2 


fi+Al 

n) 


bits / sec 


Thus, one must be aware of the possible confusion about what “bit” means. If one is just talking about the baud 
of a channel (the number of symbols transmitted per second), information content is not considered. The term 
“bits per second” is then a measure of system speed, and information is not the issue. The bits in this case are 
just binary symbols that the modem can handle, and any pseudorandom bit stream can pass and carry 
absolutely no information. . 


Example 2.4 

Approximately eight basic channel models are used in the literature. 


1. Lossless 

2. Deterministic 

3. Ideal 

4. Uniform 

5. Binary symmetric (BSC) 

6. Binary erasure (BEC) 

7. General binary (GBC) 

8. M-ary symmetric 

For the lossless channel the probability matrix contains only one nonzero element m each column: 



Here, C = log Q, where Q is the number of source symbols. 

The deterministic channel has only one nonzero element in each row 



x 6 . ->>3 


p(y I*) = 


1 

1 

0 

0 

0 

0 


0 

0 

0 

1 

1 

0 


0 ‘ 

0 

0 

0 

0 

1 
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Here, C = log Z, where Z is the number of output symbols. 
The ideal channel is 




Here, C = log Q = log Z, where Q and Z are the number of source and output symbols, respectively. 

In the uniform channel model, every row and column is an arbitrary permutation of the probabilities in the 
first row: 


*1 

x 2 

x 3 




4 

4 

J_ 

2 


J_ 

4 

2 

j_ 

4 


<2 

Here, C = \ogQ + ^p[y n \x m ) log p(y n \x m ) . Here, Q is the number of input symbols. 

n= 1 

The BSC model uses the formula for the uniform channel model: 



Here, C = log 2 2 + p log p + q log q. 

For the BEC model, the middle output yi is the erasure. An erasure is a demodulator output that informs the 
user that the demodulator could not guess at the symbol. 
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Here, C-p. 

For the GBC model 



Here, C = log 



One must find the x • by solving the following set: 


m 

Pl 1*1 + • • • + Plm x m = X Pl i l0g P{ > 
i = 1 


Pm\ x l "* Pmm x m Pmj 


Solve for x = jc,-, i = Alternatively, 


c = -W 2 (a) + aH 2 (p) + r + 2 [n 2 (a)-«i (/>)]/(£-«) 

p-a 8 1 
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The M-ary symmetric channel sends not binary symbols but M-ary ones, where M is an integer. 



P 

1 ZR 
M - 1 


j ZR 
M-l 


]ZR l ~ p 

M-l M-l 
P 


1 ZR 

M-l 


lZR 

M-l 

±2- ' p 

M-l 


Here, C = log M— ( 1 -p)\og(M- 

▲ 

Example IS 

Assume a well-shuffled deck of 52 cards. How much entropy is revealed by drawing a card? Since any card 
is equally likely, 

H - log 2 52 = 5.7 bits = In 52 = 3.95 nats = log 10 52 = 1.716 hartleys 
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Chapter 3 

Mathematical Preliminaries 

3.1 Modulo-2 Arithmetic 

Modulo-2 arithmetic is defined in the two tables below. 


Addition © 

0 

1 

Multiplication - 

0 

1 

0~ 

0 

1 

<r 

0 

~0 

1 

1 

0 

i 

0 

1 


The dot (or inner) product of two sequences is defined as 

(101 10) (11 100) AM ©01 ®M©1-O0O*O= 10 0© 1 ©0 00 = 0 

A sequence of modulo-2 digits has no numerical significance in most coding discussions. However, a 
polynomial representation for a string of binary digits is used universally; for example, 

1 101001 1 1 ® x © x 3 ® x 6 © x 7 


Here, a one in the fth position means the term x i is in the polynomial. Here, i starts at zero on the left, but just 
the opposite notation is used in many treatments in the general literature. The polynomials (like integers) form 
a ring and factoring is an important property; for example, 


(x 7 © l) = (x © l)(x 3 © * © l)(x 3 © x 2 © l) 

Because the factoring is not readily apparent, tables of such factorizations are often needed. Multiplication of 
two sequences is best done by just multiplying the polynomials and then transforming back to binary digits as 
the following example shows. 

Example 3.1 

Multiply 101101 by 1101. 
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ionou-»i©x 2 ©x 3 ©x 5 , noi<->i©x©x 3 

(l © JC 2 ® JC 3 ® JC 5 )^1 ® X © JC 3 ^ = 1 © JC 2 ® JT 3 ® JT 5 ® AT ® JT 3 ® JC 4 ® jf 1 ©J? ®J? ®j£ ®J? 

= 1 © a: © x 2 © jc 3 © jc 4 © jc 8 <->111 110001 


Note that 


x 3 © x 3 =0 
X 5 © x 5 =0 

etc. ^ 

Modulo-2 addition and multiplication are associative and commutative. A product of n\ ■ n 2 has 
(«l © «2 ~ 1) digits. 

Modulo-2 division is just like ordinary algebra; for example, (x 3 ©l)-*-(x 2 © xj. 


X 2 ©x| X 3 ®1 [x © 1 < — quotient 


x 2 ©l 
x 2 © X 

jc01 < — remainder 


Convolution of two sequences is as follows: (1101)*(10011) 


Step 1 


Step 2 


Step 3 


Step 4 


Step 5 



alignment (here, 1 101 is reversed) 


11 = 1 (first term in result) 


0101*1 = 001 = 1 


0 100-1010 = 0 


M0O- 100* 001 1 = 1000001 = 0 
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Step 6 


L±±±l 

1 "o’ 1 1 


11©M ©O-O® 01 = 101 ©0®0 = 0 


Step 7 


Step 8 


Step 9 



/. ( 1101 ) *( 10011 ) = ( 110001 11 ) 


3.2 Channel Terminology 

The terminology is not always clear in that different portions of the communications system may be 
included in the “channel.” A basic block diagram for a block-coded channel is given in figure 3.1. Here, the 
channel symbols are just n-bit binary sequences (i.e., 1011...). Figure 3.2 represents a system wherein the 
symbols are strings of binary digits and all strings are of a specified length. This figure then shows the 
additional boxes that convert from binary digits to strings of such digits. 

The “channel” is often just the transmission medium, but “channel symbols” are emitted from the 
demodulator, so that the channel boundaries are fuzzy. The inputs to the encoder are k - bit messages, source 
bits, information bits, etc. The encoder emits n-bit code words, code bits, channel bits, bauds, channel 
symbols, etc. A more detailed model is given in figure 3.3 for convolutional codes. 

Next, the message energy is defined to be 

E m =f o S 2 (t)dt 
fr-bit n-bit 



1 7 1 

Channel — '' 

Figure 3.1 . — Basic communications system using block coding. 
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n n 

1101 -* — ► Symbol A '—Channel 

1001 - — - Symbol B, 


etc. 

Figure 3.2.— More detailed block diagram for basic block-coded 
channel. 



Figure 3.3.— Communications channel using convolutional coding. 
{The source and sink are separated by the convolutional codec, 
interleaver/deinterleaver, synchronization units, and modulator/ 
demodulator pair.) 


where 5(f) is shown in figure 3.4. The received energy in a binit (called a bit) is 


Eb 



energy/data bit or energy/bit 


In block coding, the source data are segmented in k - bit blocks and passed to the encoder. The encoder 
calculates some parity check bits (by modulo-2 addition of some of the k bits) and outputs the original k bits 
along with the check bits. The number of binits (bits) from the encoder is n, thus, an (n,k) encoder. In most 
cases, the n bits are constrained to the same time interval as are the original k bits. Thus, the channel bits 
contain less energy than do the source bits. In other words, the message energy in either a 
k - bit source sequence or an n-bit channel sequence is the same. 


* E m “ kE-b ~ n Es 
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Figure 3.4. — Arbitrary message waveform. 


where E s is the received energy in the channel symbol. The quantities R y r, R Sy n, and k are related by 


where 

r code rate or code efficiency 

R data rate or information symbol rate, bits/sec 

R s symbol rate, channel symbol rate, chip rate, baud, etc. 

Thus, coding increases the bandwidth as well as the number of errors emitted from the demodulator. The 
increase in errors is due to the reduced energy per pulse now available to the demodulator. When the coding is 
turned off, the demodulator makes decisions on energy values E& whereas with coding the decisions are made 
on E s and E s < E ^ where 



The correction capability of the code overcomes the extra demodulator errors. At the receiver, let 

P = received power in modulated signal = E S R S 
Then the signal-to-noise ratio (SNR) is 


P E S R S _ EjR 
N 0 N 0 N 0 


or 


h - l 

N 0 N 0 R 

From a coding point of view, the system appears as shown in figure 3.5. The message sequence m enters the 
encoder and is mapped into the code vector sequence w. After propagation through the channel, the decoder 
acts on the sequence z. and outputs m; and one assumes that m=m with high probability. Systematic 
encoders, which produce code words of the form indicated in figure 3.5, are considered in most cases. 
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Message Z = U © 6 estimate for 

message 

Figure 3.5.— Basic block coder/decoder system. (The code 
vector u is corrupted by the channel's noise vector £. The 
decoder attempts to remove £ and recover the message.) 


Vm* Null zone 


It. L 


y' \ 


i \i/ i i l\l 


000 001 010 011 100 101 110 111 
Analog-to-digitai output 


Eight levels — ' 

Figure 3.6. — Soft-decision decoder, here quantized to 
standard three bits or eight levels. (The null zone is 
used by the demodulator to alert the decoder that 
the particular bit is completely uncertain or has been 
"erased.") 


Systematic means that the original message bits are preserved (kept in order) and the parity bits are appended 
to the end of the string. 

Figure 3.5 summarizes the basic steps used in error correction. The message vector, say m - 101 1, is 
mapped into the code vector u = 101 1001 by the encoder circuit. The received vector g is the modulo-2 
addition of the transmitted sequence u and the error vector e added by the channel. The task of the decoder 
may be listed as follows: 

1. Is £ =0? 

2. If e * 0, determine e. 

3. Develop, or reconstruct, e by some decoding algorithm. Hope that e=e. 

4. Remove the effect of the corruption due to e by just adding it to the received vector g; z + e = u + e + e. 

5. If e = <?, the decoding is successful and the error is corrected. 

Obviously, step 3 is the key one in the procedure. How to perform it is essentially the basic task of decoding 

techniques. . . 

When the demodulator outputs binary (hard) decisions, it gives the decoder the minimal amount ot 
information available to decide which bits might be in error. On the other hand, a soft decision gives the 
decoder information as to the confidence the demodulator has in the bit. In other words, a hard-decision 
demodulator outputs just two voltage levels corresponding to one or zero. A soft-decision demodulator on the 
other hand, generally outputs three-bit words that give the location of the best estimate of the signal (fig. 3.6). 
In other words, the output 000 corresponds to a strong zero, whereas 01 1 corresponds to the weakest zero. 
Similarly, 1 1 1 corresponds to a strong chance that a one was transmitted. Another demodulator output is the 
null zone! or erasure output. When the signal is about equidistant from either a one or a zero, the demodulator 
sends a special character to alert the decoder that the bit’s value is essentially uncertain. 
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Chapter 4 

Block Codes 


This chapter covers the basic concepts of block codes; chapter 5, a “second pass,” adds much of the detail 
needed for in-depth understanding. 

The forward-error-correcting (FEC) encoder accepts k information bits and outputs n bits. The n — k added 
bits are formed by modulo-2 sums of a particular set of the k input bits. The output blocks of n bits are the code 
words. For n-tuples consisting of binary digits, there are 2 n distinct ^-tuples. Of these, only 2 k are chosen as 
permissible code words. Let w; and uj be code vectors. The code is linear if w/ ® uj is also a code word. A 
linear block code is a set of 2 k /i-tuples (a vector subspace; i.e., a subset of the possible 2 n n-tuples). Figure 4. 1 
illustrates the concept of selecting code words from the entire vector space. The large dots represent the code 
words, and the small dots represent possible received vectors, which are code words corrupted by the channel 
(i.e., noise vectors added to code vectors). The code words should be widely separated (i.e., the sequences of 
ones and zeros should be as different looking as possible) to minimize decoder errors. It would be preferable 
if k~n, but generally 2 k « 2 n for good codes. 

Example 4.1 

Assume that the code word u was sent and that the channel creates two errors; that is, let 

u = 1011001 
e = 0010001 

Then, z = « © £ = 1001000. Somehow the decoder must recognize that z is not a possible code vector 
and then determine e . 

▲ 

The basic idea is to generate a code that permits the decoder to perform its function. Two matrices are 
developed that keep track of the digital strings that make up the code; these are the code generator G and the 
parity check H . Although they are generally developed in parallel, G is discussed first. 

Let the set { y 1 ,y 2 ,...,y it } form a basis in the subspace; then define the code generator G by 


G = 


0 1 
1 0 
1 1 


110 0 
10 10 
0 0 0 1 


The generated code word is u = m G, where m is the message vector that defines the operation of the encoder 
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Code word ■ 


Code word 

© 

Error pattern 


Figure 4.1 .-—Schematic for n-dimensional vector space. 
(The large dots are the set of n-tuples that form the 
code. The unused vectors may appear at the decoder 
if the demodulator makes an error.) 


Example 4.2 
For a (6,3) code, choose 



T 

1 

0 

; i 

0 

o' 

G = 

0 

1 

1 

i ° 

1 

0 


1 

0 

1 

j 0 

0 

i 


Note that the rank of 6? is k. Also note that the last three columns form the identity matrix. The code is 
systematic if 



where P is the parity array portion. The code word is 

u = (n—k) parity bits, m u ...,m k 
k message bits 

Note that here the “block” is turned around (i.e., the parity check bits are first and the message bits follow). 
Both forms of Q are used in the literature: 


or 



h 



The code word set is the row space of Q . The all-zero word is always a code word. 


▲ 


Example 4.3 

For a linear (5,3) code, choose 
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Q- 


10 0 11 
0 10 10 
0 0 10 1 


The number of code words is 2 k = 2? = 8. All of the code words are 00000 1 1001 llfOOlOll 101 10 FoiOlOl 
1 1001 01111 11 100, where the boxed ones form the basis. The code has k (here three) dimensions. Arrange 
these basis code words as 


"1 

0 

0 

! l 

f 


1 


"1 

f 

0 

1 

0 

i 

! l 

0 

= 


.\P = 

1 

0 

0 

0 

1 

o 

1 


L J 


0 

1_ 


Only the P matrix distinguishes one (n y k) code from another. Encode via G as follows: 

C = v m G 

— —m ~ 


where y m is the message vector and C is the code word. Let y m = (l01). Then, 


( 101 ) 


1 0 0 
0 1 0 
0 0 1 


1 

1 

0 


1 

0 

1 


= 10110 = c 


Observe that Q = |^v m v m Pj, which means that the message is transparent or that the code is systematic. 

▲ 


4.1 Standard Array 

The “standard array” is a table that describes the partitioning of received sequences such that a decoding 
strategy can be applied. The table is constructed as follows: The first row starts with the all-zero code word on 
the left, and all remaining code words, arranged in any order, fill out the row. Next, choose an error pattern and 
place it under the all-zero code word. For example, consider a (6,3) code generated by 


G = 


0 1 
1 0 
1 1 


110 0 
10 10 
0 0 0 1 


The first row of the standard array is then 

000000 011100 101010 110001 110110 101101 011011 000111 
where the code words are found by using the 2 k = 2 3 = 8 message vectors. That is, for m = 101 


[ 101 ] 


0 1 
1 0 
1 1 


1 1 0 
1 0 1 
0 0 0 


o' 

0 

1 


= 101 101 = code word 
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which is the sixth entry in the row. Note that the error pattern chosen for the next row cannot be any of the 
entries in the first row. Choose 100000 and add it to all the entries in the first row and place these sums under 
the code word used; that is, the first code word is 01 1 100 and adding it to 100000 gives 1 1 1 100, which is 
placed under Oil 100. The table is 


000000 

011100 

101010 

110001 

110110 

101101 

011011 

000111 

100000 

111100 

001010 

010001 

010110 

001101 

111011 

100111 


Choose another error pattern (which does not appear anywhere in the table yet) and form the next row. For 
010000, the table is 


000000 

011100 

101010 

110001 

110110 

101101 

011011 

000111 

100000 

111100 

001010 

010001 

010110 

001101 

111011 

100111 

010000 

001100 

111010 

100001 

100110 

111101 

001011 

010111 

lis manner, the table becomes 






000000 

011100 

101010 

110001 

110110 

101101 

011011 

000111 

100000 

111100 

001010 

010001 

010110 

001101 

111011 

100111 

010000 

001100 

111010 

100001 

100110 

111101 

001011 

010111 

001000 

010100 

100010 

111001 

111110 

100101 

010011 

001111 

000100 

011000 

101110 

110101 

110010 

101001 

011111 

000011 

000010 

011110 

101000 

110011 

110100 

101111 

011001 

000101 

000001 

011101 

101011 

110000 

110111 

101100 

011010 

000110 

100100 

111000 

001110 

010101 

010010 

001001 

linn 

100011 


Observe that the table has 2" = 2 6 = 64 entries, which are all of the possible 6-tuples. The code words are on 
the first row, and there are 2* = 2 3 = 8 of them. The error patterns with the fewest number of ones (hence, 
fewest errors) form the first column. The last entry was found by inspecting the table and choosing a vector 
with the fewest ones that was not in the table. The rows of the table are called cosets. The entries in the first 
column are called coset leaders. The entry in any row is the sum of that row’s coset leader and the code word 
at the top of the column. All entries to the right of the vertical line and below the horizontal one represent all 
possible received vectors. A decoding scheme would choose the code word at the top of the column as the most 
likely one sent. Recall that the coset leaders are chosen to be the most likely error patterns. There are 2 
cosets and each coset contains 2* n-tuples. Suppose the received vector is 101 100 (which is the sixth entry in 
row 6); then, a maximum-likelihood decoder (MLD) would choose 101101 (the column header) as the 

probable code word. . 

In summary, the table as described would operate as follows: The decoder would recognize the first row as 
valid code words and pass them on. If any of the vectors in the fourth quadrant of the table (49 entries) are 
received, the decoder can process them and determine the coset leader (error pattern). Adding the error pattern 
e to the received vector will generate the code word at the top of the column. The last coset leader (100100) is 
the only double-error pattern discernible. Thus, for this special case the decoder can detect and correct all 
single-error patterns and one double-error pattern (the last coset leader). If any other double-error pattern 
occurs, the decoder will make a mistake. In other words, the decoder formed from this table is able to 
recognize just the errors that form the first column. The array gives a good intuitive understanding of decoding 
strategy and the ways errors can pass undetected. Note that the code is not just single-error correcting but can 
correct the given double-error pattern in the last row. In other words, the correctable patterns do not always fall 
into easily quantified limits. 
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Example 4.4 


Choose an ( n y k ) = (6,3) code and let G be 


"1 

0 

1 

1 

0 

o' 

1 

1 

1 

0 

1 

0 

_0 

1 

1 

0 

0 

1 


which is slightly different from the G used in the previous discussion. Here, 


P = 


1 

1 

0 


0 

1 

1 


1 

1 

1 


The 2* code vectors are the three rows of G and their © sums. Thus, the code words are 

101100 111010 011001 010110 
100011 110101 001111 000000 


The table is 


000000 

001111 

110101 

100011 

010110 

011001 

111010 

101100 

000001 

001110 

110100 

100010 

010111 

011000 

111011 

101101 

000010 

001101 







000100 

001011 



etc. 




001000 

000111 







etc. 

etc. 








Definitions 

The following definitions are needed for further discussion: Hamming weight is the number of ones in a 
code word w(w); for example, 


w(001101) = 3 

Hamming distance d( u , y) is the number of places by which the code vectors u and y differ, or the number of 
bit changes to map u into y. Let 


u = 110110 
v — 100101 


d( u ,y) = 3 


It turns out that 
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u 


O 

Y 


^min " ^ 


• • 


Figure 4.2. — Hamming distance. (If u is transmitted and if 
either of the vectors to the left of the dashed line are de- 
coded, u. is chosen. If either of the vectors to the right of 
the dashed line are decoded, v is chosen and an error 
occurs.) 


d{u,v) = w(m® v) = w(01001 1) = 3 

The minimum Hamming distance is the distance between the two closest code words; it is also the weight 
of the “lightest” code word. The error correction power of a given code is determined by d^n- The number of 
correctable errors t in a received word is 


l — ^min 1 
2 

This equality follows from “nearest neighbor decoding/' which says that the received word is decoded into the 
code word “nearest” in Hamming distance (fig. 4.2). 


Example 4.5 

Assume that the transmitted code word is « = 10001 and that the received word is z = 10010. Then, since 

z = u® e, 

e= z© u =00011 


The ones in e correspond to the bits in error. Define t to be the weight of e. Here, t = 2; thus, 


t = 


^min ' 


implies that d min should be 5 or 6 to conect all possible double-error combinations in any code word. 
In an erasure, the error location is known, but no hint is given as to the bit value; for example. 


z = 1 101 _ 101 

t 


erasure (a glitch such that digit is erased) 


Then, define 

e c number of errors corrected 

number of errors detected 
p number of erasures corrected 

x number of erasures 
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It follows that 


^min — € c + e d+ l=x + 2e c + 1 >p+ 1 e<i>e c 

In the design phase, choose e c + ej for the available d m j n , which freezes the decoder design. It can be shown 
that 

^min — n ~ k + 1 


4.2 Parity Check Matrix 

The parity check matrix helps describe the code structure and starts the decoding operations. For a 
given generator 


G = 



the parity check is given by 


//A 


-n—k 


For example, 



'1 

1 

0 

! i 

0 

o' 

G = 

0 

1 

1 

! 0 

1 

0 


1 

0 

1 

! o 

0 

1 


Then, 



'1 

0 

0 

i 1 

0 

f 

H = 

0 

1 

0 

1 

1 1 

1 

0 


_0 

0 

1 

! 0 

1 

1_ 


The rank of H is (n - k ), and its row space is the null space of the code words developed by G. Then, 

gh t = 0 


Thus, 


u H t = 0 

The parity check generation scheme can be determined by inspecting the rows of H . In the preceding 
equation, let a x represent the rth digit in the message vector; then (in the right partition), 

1. First row means that a\ © is the first check digit. 
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2. Second row means that aj © a 2 is the second check digit. 

3. Third row means that a 2 © a 3 is the third check digit. 

Thus, H is useful in describing the encoding process. The equation 

uH T =Q 

is the key to detecting errors. Here, u is a valid code word. If 


rH T *0 


then r is not a code word and the error pattern e must be found. From the standard array, r is somewhere in 
the table, and the header code word would be the decoded word. Consider a code word v= (v^Vc), where m 
means message portion and c means check portion. Form the syndrome defined by 

S = vH t =v m P0v c 

Thus, S is an (n — k) vector, where F are the locally generated checks and are the received checks. If S 
= Q, no errors are detected. If 5 * Q, errors are present. Thus, S is determined solely by the error pattern e. 
Observe that if r = u © e. 


S = r_H T ={u@e)H T =uH T ®e_H T =0®eH r =eH T 


That is, each error has a specific syndrome. 

The properties of H are as follows: 

1. No columns are all zero. 

2. All columns are unique. 

3. The dual code of an (n,&) code is generated by H . That is, ua i = m H. 

4. The rank of H is the degree of G (row rank is the number of linearly independent rows). 

5. The number of checks equals the row rank of H T . 


4.3 Syndrome Decoding 


Syndrome decoding is the basic decoding scheme used in block codes. Basically, it relies on the fact that 
each error pattern generates a specific syndrome. Essentially, the decoder takes the message bits and 
regenerates the parity checks. It then compares them with the transmitted checks (by modulo-2 addition). If the 
sum is zero, no error is assumed. If the sum is not zero, at least one of the received digits is in error. The 
decoder must then determine which bits are in error. The error correction procedure is as follows: 

1. Calculate the syndrome S- rH T = + w H T = e H T (there are 2 n * syndromes). 

2. From 5 determine the error pattern (the tough step). 

3. Let e be the error pattern determined from step 2. Note that it may not be the true error pattern shown in 
step 1. 

4. Form u = L + i 
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Note that if e = e y then w = u and correct decoding is achieved. It will be shown, however, that the estimate 
e is not always correct and a decoding error occurs. The probability of such an error is the measure of the 
code’s strength. Since 2 n ~ k syndromes are possible for an (n,k) code, 2 n ~ k error patterns are correctable. There 
are 2 n - 2 n ~ k uncorrectable patterns, and if e is one of them, a decoding error occurs. Some complications 
associated with syndrome decoding are as follows: 

1. Several e patterns yield the same syndrome S. 

2. Some e patterns are code words and thus undetectable errors. 

3. Since a maximum-likelihood decoder (MLD) always assumes an e with the lowest weight (fewest 
errors), decoding errors occur. 

Example 4.6 

Consider a (6,3) code, and decode using the standard array. Let 


Therefore, 



Hi 


"l 

1 

0 

1 

0 

o' 

G = 


= 

0 

1 

1 

0 

1 

0 


_-3_ 


1 

0 

1 

0 

0 

1 


H t = 


"1 0 
0 1 
0 0 
1 1 
0 1 
1 0 


o' 

0 

1 

0 

1 

1 


Then, the number of code words is 2 k = 2 3 = 8 (table 4. 1). 


TABLE 4.1. — CODE WORDS 


Symbol 

Code word 

Weight 

-i 

110100 

3 

-2 

011010 

3 

-3 

101110 

4 

u 

101001 

3 

-5 

011101 

4 


110011 

4 

*7 

000111 

3 

-8 

000000 

0 


The weight column shows that d m ^ = 3, so t = 1 ; or single-error correcting is guaranteed. The array is 
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110100 011010 101110 101001 011101 110011 000111 


000000 


000001 

110101 

011011 

101111 

000010 

110110 

011000 

101100 

000100 

110000 

011110 

101010 

001000 

111100 

010010 

100110 

010000 

100100 

001010 

111110 

100000 

010100 

111010 

001110 

010001 

100101 

001011 

min 


101000 011100 110010 000110 
101011 011111 110001 000101 
101101 011001 110111 000011 
100001 010101 111011 001111 
111001 001101 100011 010111 
001001 111101 010011 100111 
111000 001100 100010 010110 


Observe that the last coset has two errors and was chosen arbitrarily. Thus, a double-error pattern is correctable, 
which is in addition to the guaranteed single-error patterns. The syndromes are 


Sj = 


ejH 


T 


Then, 


TABLE 4.2. — VALUES OF 
e . AND S . 


~j 

“ j 

- j 

S -J 

000000 

000 

000001 

101 

000010 

on 

000100 

110 

001000 

001 

010000 

010 

100000 

100 

010001 

111 


Then, each ej has a unique Sj (table 4.2). Suppose that the channel adds the error 


e= 100100 


Then, 


S = [100100] 


1 

0 

0 

1 

0 

1 


0 

1 

0 

1 

1 

0 


O' 

0 

1 

0 

1 

1 


1000110 = 010 


and the decoder would choose e = 010000 (from the previous ej table); thus, a decoder error has occurred. 

A 
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4.4 Classes of Code 


Because the classes or types of code are extensive, only some of the more common ones are discussed here. 
The classes or types are not mutually exclusive, as a subset of one class may be a subset of another class or 
classes. The most useful codes are linear group block codes or linear convolutional codes. Some block codes 
are listed below: 

1 . Cyclic codes — Codes where a cyclic shift in a code word generates another code word (i.e., if 101 101 10 
is a code word, an end-around shift gives 01011011 , which is also a code word). 

2. Bose-Chaudhuri-Hocquenghem (BCH) codes — A cyclic code with the property 

n = 2 m -\ m = 3,4,5,... 


To correct t errors, one needs 


n - k < mt 


or 


k > n - mt , d m [ n > 2t + 1 

For example, let m = 4, t = 2, and k - 7. Thus, a (15,7) code results, and = 5. 

3. Golay codes — One of the three types of “perfect” code (i.e., a t-error-correcting code whose standard 
array has all the error patterns of t (or fewer) errors and no others as coset leaders). The two binary forms are 
(23,12) and (24,12). For these, t = 1. 

4. Hamming codes — Hamming codes have the properties 

n - 2 m - 1 , n-k- m m = 1,2,3,... 

^min ~ t — \ 

Note that there are 2 n ~ k different binary sequences of length n-k (delete the all-zero sequence); then, 

n = 2 m - 1 


which defines these codes. 


Example 4,7 

For the (7,4) Hamming code there are seven possible sequences of length three to choose from: 001, 010, 
011, 100, 101, 110 , 1 1 1 . Choose four out of the seven; [ 4 1= 35 choices. If the code is to be systematic (two 
or more binary ones are needed), choose four out of four (hence, only one choice). However, the number of 
permutations of the four is 4! = 24, which means 24 distinct choices for H . Choose the following pair: 



"Oil" 
101 
110 
111 , 
100 
010 
001 


"1000 

0100 

0010 

0001 

f ^ 


Oil' 

101 

110 

111 



The encoder is designed from H (fig. 4.3). In the figure, mj, mi, m 3 , and m 4 are the message bits and Cj, C 2 , 
and C 3 are the three checks. The checks are read from each column of H. Here, 
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YT?i 

7712 
7n 3 
■m 4 

c 2 

■c 3 

Figure 4.3.— Encoder for (7,4) Hamming code. (The three 
checks are developed from the four message bits, m-j , m 2 , 
m 3 , and m 4 .) 



Ci = m2 © W3 © m 4 
C2 = mj © m3 © m4 
C3 = mi © m2 © m 4 


For example, let the code word y = * G 


x — 101 1 
y = 1011010 


Assume an error in the fifth digit (counting from the left); then, 

e = 0000100 


and 


z = y © e = 1011110 


At the decoder, calculate S 


S = zH T = 100 

Because 100 is the fifth row of H , the fifth digit is in error. The decoder generates e and adds this to 2 to 
correct the error. This association of fifth with fifth is a special case and should not be considered typical. A 
decoder for the (7,4) Hamming code appears in figure 4.4. 

Hamming codes have the following miscellaneous properties: 

1 . The total number of distinct Hamming codes with n = 2 m - 1 is given by 


number = 


J 

2 m — l) 

I! 

m- 

n 

[(*•- 

2') 


i*0 


For the (7,4) Hamming code, m = 3 and 
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Received 


Corrected 



Figure 4.4. — Decoder for (7,4) Hamming code generated in 
figure 4.3. 


f\ =(2 3 - 1)(2 3 - 2)(2 3 - 2 2 ) = (7X6X4) 

i=0 

7! 

number = , w w ■ = 30 

(7X6X4) 

2. Dual Hamming codes are known as maximal length codes: 

n = 2 m -l, d = 2 m — 1 , k = m 

In the following codes, all nonzero code words have the same weight; hence, all distances between code 
words are the same (referred to as “a simplex”): 

1. Reed-Muller codes — Cyclic codes with an overall parity check digit added 

» = 2 ". * = £ 

i=0 



2. Goppa codes— A general noncyclic group that includes the BCH (which are cyclic); mainly of theoretical 
interest 

3. Fire codes — Codes for correcting bursts of errors. A burst of length b is defined as a string of b bits, the 
first and last of which are ones already there. Here, 




n-k + 1 
3 


and 
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n = LCM[2 m - 1, 2h-l] 


4. Reed-Solomon codes — Often used nonbinary codes with the following properties: 
n = m (l m - 1 j bits, k = n-2t bits, d = m (2f + 1) bits 


4.5 Decoders 

The standard array partitions all the possible 2" n-tuples that may be received into rows and columns. The 
decoder receives r and finds S. It determines e by either a lookup table, or other means, and adds this to r to 
recover the transmitted code word. This scheme is known as maximum-likelihood decoding (MLD). Block 
decoders are generally classified as algebraic or nonalgebraic. Algebraic types solve sets of equations to 
determine e; the others use special algorithms. A class of nonalgebraic decoders, called information set 
decoders, includes Meggit and threshold types. The decoding processes are discussed in chapter 5. In general, 
hard decisions are used, as soft decisions cause algorithm and circuit complexity problems. Some decoders 
handle erasures as well as errors. Error-trapping decoders are discussed in Lin and Costello (1983). 


4.6 Counting Errors and Coding Gain 

For simplicity, only binary coding and decoding are assumed. Then, the energy between an uncoded and 
coded bit is straightforward. 


E c =-E b =rE b (4.1) 

n 

where E c is the energy for a coded bit (one leaving the encoder), E b is the energy for an information bit (one 
entering the encoder), and r is the code rate. For the many digital modulation schemes used, the modems 
generate and make decisions on symbols (groups of bits), so that the counting of bit errors is more involved. If 
the codec is turned off, r = 1 and E c = E b . A given modulation scheme has a bit-error-rate-versus-£'^/A , 0 plot, 
which is the probability of received bit error p b versus the ratio of energy per bit to noise power. For binary 
phase shift keying (BPSK) the relationship is 


Pb=Q 


' [ 2 Ep 
111 *. J 


(4.2) 


and is plotted in figure 4.5. Without coding, the theoretical probability of error is given by equation (4.2). 
However, in a real system, the curve (fig. 4.5) would be pushed to the right somewhat to account for 
implementation losses. When coding is applied, the probability of a bit error is (subscript c means coded) 


p c = Q 




(4.3) 


Note that because 


Pc > Pb 
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Figure 4.5. — Probability of error per bit in BPSK signaling 
system versus ratio of energy per bit to noise power 
spectral density EyN 0 . (For BPSK, QPSK, MSK, 
and OKQPSK (gray coded), p u = 1/2 eric^2E h /N 0 = 
Q{j2E^).) 


more errors are emerging from the demodulator. The decoder only works on blocks of bits (code words); 
therefore, the block error rate must be determined for blocks emerging from the decoder, given the channel bits 
with error probability entering the demodulator. Once this block error rate is found, the resulting bit error rate 
must be somehow calculated into the data sink. This last step is difficult, and many approximations are used 
in the literature. 

The probability that a block is decoded incorrectly may be called pg. In the literature, 

prob (block decoded in error) = p m (message error) = p w (word error) = p% (decoder error) = p^ 

Once pp has been found, the probability of binit (bit) errors emerging from the decoder can be approximated. 
Then, \Pb) s (here subscript s means error going into the data sink) can be plotted versus EyN 0 to see how the 
code performs. Figure 4.6 shows the uncoded BPSK curve along with those for two (njc) codes. Note that the 
vertical axis is both p^ and ( Pb ) s . Observe that the shapes of the two (pb) s ’Versus-Eb/N 0 curves are not the 
same and that neither is representable by some standard Q { •) curve. Each has been calculated point by point. 
The “threshold points” for both codes are near EyN 0 = 6 dB (where they intersect the uncoded curve). If 
< 6 dB, coding degrades performance because the number of errors is so great that in each received 
word the number of errors is larger than the error patterns the code has been designed for. Also, the 
demodulator makes more errors than in the uncoded case, since now decisions are made on pulses with less 
signal energy while coded. For E\JN 0 > 6 dB, the correcting power kicks in and improves performance. In this 
range, the correction capability overtakes the extra demodulator errors that occur due to the lower pulse energy 
in coded conditions. 

The coding gain is the difference in EyN 0 between the coded and uncoded plots for the same pb = [pb) . 
For example, the gain for the (« 2 , ^ 2 ) code at pb = 10 -5 is about 1.5 dB. It can be shown that the asymptotic 
gain is roughly 
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E b /N 0 . dB 


Figure 4.6.— Bit error rate of two (n, k) codes along with basic 
curve for BPSK. (At p * 1 (T® the (n 2 , lr 2 ) code has a 1 .5-dB 
coding gain. 


G = gain (asymptotic)— 10 log[r(f + 1)] for hard decisions 

^lOlogfrdnu,,] for soft decisions 


Here, G is in decibels. 


Calculate the change in bit error rate between an uncoded and coded situation. Assume BPSK in Gaussian 
noise, and assume that the (15,1 1) BCH (r = 1) code is used. Also assume that hard decisions are made. This 
problem illustrates the nature of approximations needed to determine the coding gain. The decoder operates 
only on blocks of digits; therefore, if a block is decoded incorrectly, the bit error rate cannot be determined. 

Let p u and p c represent the uncoded and coded channel bit (more generally, symbol) error probabilities. 



Here, E b and E c are the bit energies in the uncoded and coded cases. Let Eb/N a - 8.0 dB for purposes of 
calculation and assume the data rate R — 4800 bits/sec. Then, without coding, 
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= 6.3096, — = R \ -^-1 = 30 286 (44.8 dB) 

N 0 No N 0 ) 

p u = (2(Vl2.62) = 2.0425 x 10"* 

where the following approximation for Q(x) was used: 

i f 

2W = — 7 r=exp x>3 

xyJ2n ^ 2 

The probabihty that the uncoded message block will be received in error ( P m ) u is calculated as follows: Each 
block contains 1 1 digits (slots). The probability of no error in any slot is (1 - p u ). For 1 1 consecutive slots, the 
probability of no error is ( 1 - p u ) 1 1 . Then, the probability of some errors in a block is 1 - ( 1 - p u ) 1 1 . Thus, 

(p m ) u = 1-(1 -p u f = 1 -(1 - p u f = 2.245 x lO" 3 

is the probability that at least one bit is in error out of all 1 1 in the message block. 

With coding, 

N 0 N 0 15 N 0 

so that 

( (Tj — A 

p c =Q J— (12.62) = 1.283 x 10~ 3 

V ’ 15 J 

Note that 

Pc * > Pu 

as stated earlier. The code performance is not yet apparent, but it will be shown later that {p m ) c > the block 
error rate for a r-error-correcting code, is 

;=/+l 

and here t = 1 and n = 15. A good approximation is just the first term; then, 

(P.) t =( 1 2 5 )p t ) 2 (l-P c )' 3 = 17xl0-' 

Observe that block error rate for coding ( p m ) c is less than that for uncoded blocks ( p m ) u ; that is, 
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{Pm) c = 1 -7 X 10 - 4 <(p m ) H = 2.245 Xl0 -3 

even though more bit errors are present at the demodulator output with coding. Note that 

(Pm) u _ 1 1 i 

M~ c ' 

or the code has improved the message error rate by a factor of 13.2. Now, from the block error rate calculate 
the resulting bit error rate. A commonly used approximation is 

w, 

i-t + 1 v J 

and when t = 1, this can be shown to reduce to (Sklar (1988), appendix D) 

(^),=Pc[l-(l-Pcr 1 ] = 2.285xlO- 5 

Table 4.3 determines the message or block error rates for a range of E^N 0 \ they are plotted in figure 4.7 
along with the standard BPSK curve. Note that the coded case is worse than the uncoded one at 4 dB and 
crosses at about 4.7 dB. 


TABLE 4.3. — BLOCK ERROR RATES 



Table 4.4 gives the ( pt , ) or the bit error rate into the sink; this is plotted in figure 4.8. It crosses the BPSK 
curve at about 5.5 dB. At (pb) s = l.OxlO -7 , the gain is about 1.3 dB. The approximate gain given earlier is 

G(asym) <JB = 10 log 10 |j^(2) = 1 .66 dB 


which agrees within the normal limits in such problems. 
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Figure 4.7. — Coded (p^ and uncoded (p fn ) Ii block error 
rates (dashed lines) for (15, 11), f = 1 code. 


TABLE 4.4. — BIT ERROR RATE INTO SINK 


dB 

Pc 

(p b) s 

4 

0.1237 

0.1042 

6 

7.87 xlO -3 

8.249 x 10 -4 

7 

3.368 x 10~ 3 

1.554 xlO -4 

8 

1.283 xlO -3 

2.285 x 10~ 5 

8.5 

6.8872 xlO -4 

6.611 xlO -6 

9.0 

3.45 xlO -4 

1.664 x 10~^ 

9.5 

1.6 xlO -4 

3.59 x 10 -7 

9.6 

1.36 xlO -4 

2.583 x 10 -7 

10.0 

6.81 x 10 -5 

6.48 xlO -8 


A 

The calculation of the probability of a bit error from a decoder is necessarily vague, since the causes for 
errors are found in signaling technique, signal-to-noise ratio, interference, demodulator type, decoder 
implementation, code, etc. Essentially, the decoder emits blocks that should be code words. However, the 
blocks can be erroneous for two basic reasons. First, the error pattern could be a code word; thus, an 
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J*/A/ 0 ,dB 

Figure 4.8. — Bit error rate of (1 5, 1 1), t = 1 code (dashed line). 

undetected error event occurs. The number of incorrect bits is indeterminant; all we know is that the block is 
in error. Second, the error pattern could have more errors than the code can handle; this is sometimes called 
algorithm or code shortcomings. Summarizing the determination of code gain again, 

1. The uncoded bit error rate is known from basic modulation theory; for example, (n,fc)(BPSK) 



2. The coded bit error rate is then calculated for an (n,fc) code as 



3. The uncoded message, or block, error rate can be found by 

{pm) u = 1 “(l - Pu) 

but it is not necessary in the final analysis. 

4. The coded message, or block, error rate must be found. Many expressions are available, and a commonly 
used one is 

i=r+l v J 
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5. Once this is found, the number of bit errors into the sink {Pb) s is calculated. A commonly used 
expression is 

i=r+l v ' 

which is written in terms of the coded probability p c . The form of is nearly the same as (p m ) c except 
that each term in ( Pb ) is weighted by the factor iln . 

6. Plotting p u and \Pb) s on a common scale permits the graphical determination of the gain. 

The interplay between p u ,{Pm) c , and {Pb) s depends on the code structure, the algorithm implementation, 
and the form chosen for G. Different equations are found for ( p m ) c and (Pb) s because various assumptions 
are used and special formulas are valid for specific codes. Thus, the literature is rich in formulas, many of 
which are summarized here. 

4*6,1 Formulas for Message or Block Errors 

The following notation is used: 

{Pm) c ~ Pb 

In the concept of the weight distribution of a code, the number of code words with the specific weight i is 
represented by A,. The complete set of {A/} represents the complete weight distribution for the specific code. 
The weight distributions for some codes are known and published, but many codes have unknown 
distributions. The distribution is denoted by the enumerating polynomial 

n 

A(x) = ]T a,* 1 

i=i 

where A/ is the number of code words with weight i. For the dual (n, n - k) code, the enumerator is known to 
be 

For Hamming codes, 

r n-\ n+1 

A(x) = — — (l + x) n +n (1 + x) 2 (1-x) 2 
n + 1 1_ 

For their duals, which are maximal length codes (2 m - 1, m), 

A(x) = 1 + — 1 jjc 2 "" 1 

For the Golay (23,12) code, 

A( x) = 1 + 253 (;c 7 + 2x % +2x 15 + x 16 ) + 1288 (x 1 1 + x 12 ) + jc 23 
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For the extended Golay (24,12) code. 


A(x) = 1 + 759(x 8 + x 16 ) + 2576 x 12 + x 24 

Note that for the extended code, the odd-weight code words (weights 7, 15, 1 1, and 23 of the basic code) have 
been eliminated. For Reed-Solomon codes, 

v 1 / j=o v J J 

An ( n,k ) code can detect 2" - 2* error patterns. Of these, 2 n_ * are correctable. The number of undetectable 
error patterns is 2 k - 1. The most commonly used formula for p B is 

PB = {Pm) c = ' 

i=r+l v - ' 

which is due to algorithm shortcomings (i.e., more errors than the code can handle). The block errors due to 
undetected errors may have the following forms: 

n 

p B (undetected) = ^Ajp‘ c (l- p c )" 


(note that A t = 0 for i < d „ ,«) or 

p a (undetec.ed) = l-tf"V'(l-F c r= t 

i=0^ ,J j=d ma -\ VJJ 

For the special case of codes with n — k- 1 , 

p B (undetected) = ~ n even 

n - 1 

~r 

-z 

)=l 

In general, the complete expression for p B is the sum of both; that is, 

p B (total) = p B + p B (undetected) 

However, in the literature it is seldom clear what approximations are made (i.e., if the undetected portion is 
omitted or not). 
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Many bounds for pg have been developed to help in the general case, and the most commonly discussed 
ones are 

1 . Sphere packing (upper bound case) 


• ^roin 
* ^ 


Pb ~ X ” Pc ( l ~Pc) n 3 
d . -i-1 \J J 


s i n IpU'-P ')" 1 

d ■ J 


J=~ 


n odd 


n even 


2. Union bound 


PB^Yj A J 

;=i 


j_ (j\ 


^-pcT' j odd 

l ~~ 

t 

i=^+ 1 


j even 


3. Sphere packing (lower bound case). Let t be the largest integer such that 


>"-‘4(1 

/=0 V 


and 


N t+l 



Then, 


" \Pc( l -Pc) n ‘ ~ N r+1 Pc +l i l -Pc) n ~‘~ l 
/= cA ' 


4. Plotkin (a lower bound). This is a bound on the minimum distance available. The effect on is therefore 
indirect. 


n k> 2c/ m j n 2 log2 ^min 

In these formulas, the inequalities become exact only for the “perfect codes” (i.e., Hamming, Golay, and 
repetition). 
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4.6.2 Formulas for Bit Errors Into Sink 

The most common formula for the bit error rate from the decoder, which goes to the sink, is 



n—i 


where j3, is the number of remaining errors in the decoded block. Obviously, this number is vague and the 
following limits are generally imposed: 


i - 1 < A < i + 1 

Here, /' is the number of errors entering the decoder. Other forms are 


(Pb) s 


{Pb) s = — — — - Ps(total) 
v s n 

( \ _v . ~ — 1 

\ Pb >s « L f J 2 n — 1 2" -1 

/=! 


PB 


p B (undetected); p 5 (undetected) = 
n 


( Pb) s = 


,k - 1 


2 * -1 


PB 



The reasoning behind these formulas is as follows: Under the pessimistic assumption that a pattern of / bit 
errors (/ > r) will cause the decoded word to differ from the correct word in (i + t) positions, a fraction 
(/ + t)!n of the k information symbols is decoded erroneously. Alternatively, a block error will contain at least 
t + 1 errors (if it is detectable) or It + 1 bit errors (if it is not). Thus, on the average the factor 1 .5r + 1 results. 

A result published by Torrieri (1984) is perhaps most accurate: 


or 


/=r+l 


;Wo-*r4 . 1 



The first equation is exact for the odd-n repetition code, d = n,k= 1. 

Some simple bounds on ( p b ) can be developed as follows: Consider 1 sec of transmission; the number of 
code words transmitted during this interval is 1 /T w , where T w is the duration for a code word. Since each code 
word contains k information symbols, the total number of information symbols transmitted is kIT w . The number 
of word errors is p B /T w . If a denotes the number of information symbol errors per word error, the bit error 
probability is 


(Pb) s 


OPbJJw =cc Pb 
k/T w k 
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which is simply the ratio of the number of information symbols in error to the total number of information 
symbols transmitted. The problem, however, is to determine a, which varies from case to case. As a worst case, 
assume that each word error results in k information symbol errors; then, 

{Pb) s < PB 

The lower bound is obtained by considering the most favorable situation in which each word error results in 
only one information symbol error. For this case a = 1 and 


(ft). 


> 


Pb 

k 


For small values of k, the bounds are tight and (p b ) ~ ps- 

A simple approximation for the high EiJN 0 cases is as follows: Here the symbol error probability is quite 
small, and word errors are probably due to t + 1 symbol errors. Of these t + 1 symbol errors, (t + 1 ){kJn) are, 
on the average, information symbol errors; thus, 


« = ('+!)- 
n 


and the approximation 




£+1 

n 


Pb 


follows. Another upper bound is 


(ft ),* 2 


-n(r 0 -r) 


where 


r 0 = 1 — log 2 


1 + 


^Pe{ l ~Pe) 


is the cutoff rate. 

The following bounds on d m j n indirectly affect {Pb) s • 
1. Varsharmov-Gilbert-Sacks bound 


2. Elias bound 


V„_l\ 


T "f 

" i 


i=0 


<2 


n-k 


^i!L<2A(l-A) 


where 0 < A < 1 and A satisfies the equation 
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r = — = 1 + A log 2 A + (1 - A)log 2 (l - A) 
n 

All BCH codes (which are used often) are known for n < 1023, and the relationships between n, ^mm’ and 
t are 


^min 

2 

-1 

^min 

even 

^min 

-1 

^rain 

odd 


n-k> b — l + log 2 n 


where b is the burst length. 

For Hamming codes a special formula exists: 

(p b ) s = l-Yo{l-PcT -Y\Pc{ l ~PcY * -Y2Pc{ l ~PcY 
where y, is the number of coset leaders of weight i and 


Y /* 



n 




4.7 Formula Development 

The extensive compilation of formulas for p B and (p B ) s was necessary, since (p b ) s is needed to calculate the 
coding gain. Coding gain is the main figure of merit for a communications system application. TIk computed 
gain for a given code is at best rather approximate, and the uncertainty at (p b ) s = 10 is about 
0.9 dB (difference between bounds). At (p b ) s = 10" 6 , this reduces to about 0.5 dB. Since the realizable gain for 
most practical situations is about 3.5 to 4.5 dB, the uncertainty is about 25 percent. This fact is part of the 
reason why bit-error-rate testers (BERT’s) are often used to evaluate a codec pair on a simulated channel. 

The columns of the standard array divide the n-tuples into subsets of words “close” to the column header. 
The number of n-tuples N e in each set obeys the following (for a /-error-correcting code): 


Note that there are exactly n patterns that differ from the column header in one position, f” J patterns that 
differ in two positions, etc. Previous examples show that almost always some patterns afeleft over a ter 
assigning all those that differ in t or fewer places (thus, the inequality). Since there are 2 possible sequences, 
the number of code words N c obeys 


Nr 


= 2 * 
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which is known as the Hamming or sphere packing bound. 

Several developments for the block error rate pg are presented here. Note that 

prob( any one bit received correctly) = (l - p c ) 

prob (all n received correctly) = (1 - p c ) n 

prob (received block has some error) = 1 - (1 - p c ) n 

prob (first bit in error; others correct) - p c { 1 - p c T~ X 

prob (just one bit in error ) = np c (l - p c ) n ~ l 

The last expression follows, since the bit in error can be in any of the n possible slots in the block and all others 
are correct. 


prob( two or more errors) = 


l ~{ l -Pc)\- n Pc{ 1 - Po- 


litic, the first term is the probability of some error; the second is the probability of one error. This last 
expression is the probability for a single-error-correcting (and only single) code. Sometimes, this is called the 
undetected incorrect block error probability, but the same terminology also applies to the case when the error 
pattern is itself a code word. Thus, some confusion is possible. Rewrite this as 


prob( two or more errors) = /^(undetected if Hamming) 

= Pc n(n - 1) p c small 

= {p c nf p c small, n large 

The calculation for two errors is as follows: For a particular pattern of two errors, the probability of error is 


PH'-Pcf- 2 

That is, two in error and n - 2 correct. The total number of different patterns that contain two errors is 

fn\ _ n\ 

UJ"2!(n-2)! 

or the number of combinations formed by choosing from a pool of n distinct objects, grabbing them two at a 
time. The distinctness of n stems from each slot carrying a label. Then, 


prob (two errors) = 



Generalizing to g errors gives 


prob(f errors) = 



dii-PcT' 
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Note that 


t[%c( >- nr* 

Alternatively, the coefficient for two errors can be viewed as follows: Observe that 

(% "1 - 
{2) 2 ! (« — 2 )! 

is also the number of permutations of n objects, two of which are alike (the errors) and n- 2 of which are 

alike (the undamaged bits). . . . , 

To end this section, refer to table 4.5, which catalogs the various expressions for p u for many digital 
modulation schemes. These equations may be plotted when needed to perform a gam calculation. 


TABLE 4.5— MODULATION ERROR RATES 


Let A = a 


IEl 

N„ 


Wef 


N n 


C = — expf- -^-1; R = bi 
2 2N 0 ) 


bit rate. 


Type of signaling 


Baseband unipolar 
Baseband polar 

Bandpass binary phase shift keying 
(BPSK) 

Bandpass quadraphase shift keying 
(QPSK, gray coded) 

Minimum shift keying (MSK) 

On-off keying (OOK) 

Frequency shift keying (FSK) 

Differential phase shift keying 
(DPSK) 

Differentially encoded quadrature 
phase shift keying (DEQPSK) 

M-ary 


Required 

bandwidth 


Rf 2 
RI2 
R 

R/2 

3R12 

R 

R+2Af 

(4f«/2-/l) 

R 


Pu 


B 

A 

A 

*) 

A 

C 

B 

C 

B 

C 

C 

2 B 


^coherent detection; matched filter; hard decision 

coherent 

noncoherent 

coherent 

noncoherent ( Et>IN 0 > 1/4) 

coherent 

noncoherent 

noncoherent 




: eX P 


^symbol 

l JNo . 


4.8 Modification of Codes 

Often, there is a need to modify a specific code to conform to system constraints. In other words, the values 
of n and k must be changed so that the code “fits” into the overall signaling scheme. The block length can be 
increased or decreased by changing the number of information and check bits. The block length can be kept 
constant while changing the number of code words. The changes that are possible will be illustrated for the 
Hamming (7,4) code. The basic (7,4) code is cyclic and the defining matrices are 
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"1 

1 

0 

[ 1 

0 

0 

o' 

0 

1 

1 

i ° 

l 

0 

0 

1 

1 

1 

1 0 

0 

1 

0 

_1 

0 

1 

! o 

0 

0 

1 


"1 

0 

0 

1 

0 

1 

f 

0 

1 

0 

1 

1 

1 

0 

0 

0 

1 

0 

1 

1 

1 


For cyclic codes, another notation is used for the generator, namely the generator polynomial. This polynomial 
and what it means are discussed in chapter 5. For the above G, it is 

#(*) = (1 + X + X 3 ) 


The changes to the code are illustrated in figure 4.9, which is the example in Clark and Cain (1981). 

A code may be extended by annexing additional parity checks. The added checks are carefully chosen to 
improve code weight structure (i.e., to modify the set {A/}). For a single overall parity check addition, the 
check is equal to the remainder obtained by dividing the original code word by the polynomial x + 1. With the 
additional check the weight of all code words is an even number. Thus, the (7,4), d = 3 (the subscript min is 
dropped for convenience) Hamming code becomes an (8,4), d = 4 code. Because the new code is no longer 
cyclic, no generator polynomial is given. All codes with an odd minimum distance will have it increased by 
one by the addition of an overall parity check. A code may be punctured by deleting parity check digits. 
Puncturing is the inverse of extending. The deleted check is carefully chosen to keep the minimum distance the 
same as that before puncturing. A code may be expurgated by discarding some of the code words. For cyclic 
codes, this can be accomplished by multiplying g(x) by x + 1. For the case (x + 1), the new generator is 
g(x) (jc + 1), and the code words are just the even ones from the original code. A code may be augmented by 



Figure 4.9. — Changes that specific (7,4) code can assume for 
specific applications. 
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adding new code words. Augmentation is the inverse of expurgation. Any cyclic code can be augmented by 
dividing out one of its factors. For example, if g(x) has the factor * + 1, then g(x)/(x + 1 ) generates another 
code with the same code word length. A code may be lengthened by adding additional information symbols. 
For a binary cyclic code that has a factor x + 1, the lengthening is done in two steps. First, augment by dividing 
by * + 1; then, extend by adding an overall parity check. A code may be shortened by deleung informaUon bits. 
For cyclic codes, this can be done by making a segment of the information symbols identically zero at e 
beginning of each code word. A shortened cyclic code is no longer cyclic. In summary, 


(n, k ) — i {n + 1, k ) 
(n,k)~*{n-i,k-i) 


extended by 1 
0<i<k shortened by i 


EjXaniple 4 9 

This example follows the discussion in Sweeney (1991). To shorten the code with matrices 


1 

0 

0 

0 


0 0 
1 0 
0 1 
0 0 


0 

0 

0 

1 


1 1 0 
1 0 1 
0 1 1 
1 1 1 


H = 


1 

1 

0 


1 0 1 
0 1 1 
1 1 1 


1 0 0 
0 1 0 
0 0 1 


first set one of the information bits permanently to zero and then remove that bit from the code. Let us set the 
thir d information bit to zero and thus remove the third row from G: 


G = 


1 0 0 0 1 
0 10 0 1 
0 0 0 1 1 


1 0 
0 1 
1 1 


Next, to delete that bit, remove the third column: 


G 


'10 0 110 
0 10 10 1 
0 0 1111 


The parity check matrix changes as follows: The checks at the end of the deleted row in G appear as the third 
column of H, so that the third column should be deleted: 


H 



1 1 
0 1 
1 1 


1 0 0 
0 1 0 
0 0 1 


which is a (6,3) code. 

A second example of shortening uses the (15,1 1) code with H: 
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"1 1 0 10 0 1 1 1 0 1 1 0 0 0 " 

1 0 1 0 1 0 1 10110100 

H - = 0 1 1001 101 110010 
0 0 0 1 1 1 0 1 1 1 1 0 0 0 1 

Removing all the odd- weight code words by deleting all the even-weight columns gives 

"111 0 1 0 0 0 " 

• _ \ 10 10 10 0 

~ ~ 1 0 1 1 0 0 1 0 

0 1 1 1 0 0 0 1 

which is a (8,4) code with d = 4. 


A 
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Chapter 5 

Block Coding (Detailed) 

5.1 Finite Fields 

An ( n,k ) code comprises a finite number of code words, and if certain properties are incorporated, the code 
words can be treated as elements of a finite field. A finite field is the set {0,1, 2, 3,..., p - 1 }, which is a field of 
order p (p is a prime number) under modulo-/? addition and multiplication. It can be shown that the order of 
any finite field is a prime, and such fields are called prime or Galois fields. They are denoted as GF(p). 

Example 5.1 

In modulo-/? addition, take two elements in the field and add them (ordinary addition); the modulo-/? sum is 
the remainder obtained by dividing the result by p. For p = 5, the table below summarizes the procedure. 


© 

0 

1 

2 

3 

4 

0 

0 

1 

2 

3 

4 

1 

1 

2 

3 

4 

0 

2 

2 

3 

4 

0 

1 

3 

3 

4 

0 

1 

2 

4 

4 

0 

1 

2 

3 


In modulo-/? multiplication, take two elements and multiply (ordinary); the remainder after division by p is 
the result. The table below summarizes the operation for p = 5. 



12 3 4 

T 

12 3 4 

2 

2 4 13 

3 

3 14 2 

4 

4 3 2 1 


It is possible to extend the field GF(/?) to a field of p m (where m is a positive integer) elements, called an 
extension field of GF(/?), denoted by GF (p m ). 

Example 5.2 

GF(2) is the set {0,1 } with modulo-2 addition and multiplication 
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A 


© 

0 

1 


0 

1 

0 

0 

"7 

0 

0 

~0 

1 

1 

0 

i 

0 

1 


From here on, only a + is used for modulo-2 addition, for convenience. 


5.1.1 Properties of GF(2 m ) 

The notation GF(<?) is used often; here, q = 2 (in general, q = 2 ). A polynomial//) 


with coefficients from 


GF(2) is 


/(*) = /o + f\X + fix 2 +■■■+ fnX n 

where/; = 0 or 1 is a polynomial over GF(2). There are T polynomials of degree n. Division of polynomials 
is crucial. Let 


f(x ) = 1 + X + x 4 + * 5 + 

gM = i +*+* 3 


Then, 


f{x)/g(x ) : 

x 3 +x + l| x 6 +x 5 +x 4 +x + l U 3 +jr 2 <-<? (*) 

/ + x 4 + r 3 

x 5 + jc 3 +jt + l 

X 5 + x 3 +JC 2 

x 2 +.r + l«-r(x) 


or 


/(*) = «(*)*(*) + r W 

where q(x) is the quotient and r{x) is the remainder. When r(x) = 0,/is divisible by g and g is a factor of/. If 
fr x ) has an even number of terms, it is divisible by x + 1. A root of fx), x n means/fx r ) = 0. A polynomial p(x) 
over GF(2) of degree m is said to be irreducible over GF(2) if p(x) is not divisible by any polynomial over 
GF(2) of degree less than m but greater than zero. Any irreducible polynomial over GF(2) of degree m divides 

x 2 + 1. 


SAMPLE 3 7 . . 

Note that p(x) = x 3 + x + 1 divides x 2 1 + 1 = x + 1, so that p( x) is irreducible. 

A 


An irreducible polynomial p(x) of degree m is primitive if the smallest positive integer n for which p(x) 
divides jc"+lisn = 2 m -l.A list of primitive polynomials is given in table 5.1. For each degree m, only a 
polynomial with the fewest number of terms is listed; others exist but are not given. 
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TABLE 5.1. — PRIMITIVE 
POLYNOMIALS 


m 

Polynomial 

3 

3 

\ + x + x 

4 

1 + X + x 4 

5 

1 + X + * 5 

6 

1 + X + x 6 

7 

1 + x 3 + ;c 7 

8 

1 + x 2 + x 3 + x 4 + x 8 

9 

1 + x 4 + x 9 

10 

l + x 3 +x 10 

11 

1 + * 2 +* n 

12 

1 + X + X* + X^ + x 12 

13 

1 + x + x 3 + x 4 + x 13 


A useful property of polynomials is 

[/wf -+') 

5.1.2 Construction of GF (2™) 

To construct a field, first introduce a symbol a and then construct the set 

F = jo, 1, a, a 2 , a 3 , . . a 7 , . . .J a 0 A 1 

Because the set is infinite, truncate it in the following way; Since 


replace x by a 


Set p(a) = 0; then. 


x 2 1 + 1 = <?(x)/?(x) p(x) primitive 


a 2 " 1 + 1 = q(a)p(a) 


a 


2 m -1 


+ 1 = 0 


or 


a 


2 m -\ 


= 1 


which truncates the set to 


F = jo.l.a, a 2 , .... a 2 ” 2 J 
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Example 5.4 

Construct the field GF(2 4 ) by using p(x) = 


1 + x + x 4 . Note that p(x) is given in table 5.1. Set 


p(a) = 0: 


1 + a + a 4 =0 


Then, 


a 4 = 1 + a 

This last identity is used repeatedly to represent the elements of this field. For example, 

a 5 = act 4 = a( 1 + a) = a + a 2 
a 6 = oca 5 = a{a + a 2 ) = a 2 + a 3 

a 7 = era 6 = a|a 2 + a 3 j = a 3 + a 4 =a^ + l + a = l + a + a^ 
etc. Note that a 15 = 1. Three representations of the field are given in table 5.2. 


TABLE 5.2. — THREE REPRESENTATIONS 
FOR ELEMENTS OF GF(2 4 ) GENERATED 
BY p(x) = 1 + x + x 4 


Power 


4- tuple 

0 

0 

(0000) 

1 

1 

(1000) 

a 

a 

(0100) 

a 2 

a 2 

(0010) 

a 3 

a 3 

(0001) 

a 4 

1 + a 

(1100) 

a 5 

a + a 2 

(0110) 

a 6 

a 2 + a 3 

(0011) 

a 7 

1 + a + a 3 

(1101) 

a 8 1 

1 + a 2 

(1010) 

a 9 

a +a 3 

(0101) 

a 10 

1 + a+a 2 

(1110) 

a 11 

a + a 2 + a 3 

(0111) 

a 12 

1 + a + a 2 + a 3 

(1111) 

a 13 

1 a 2 +a 3 

(1011) 

a 14 

1 + a 3 

(1001) 


Observe that the “elements” of the field are 4-tuples formed from ones and zeroes. Each element has three 
representations, and each is used in different steps in subsequent discussions. A general element is given the 
symbol /3. For example, 
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/3 a 12 <-> 1 + a + a 2 + a 3 ( 1 1 1 1 ) 


Let j3 be a root of a polynomial of degree less than m in GF(2 m ). Let <p(x) be the smallest degree polynomial 
such that = 0. Then, <p(x) (it is unique) is the minimal polynomial of /J. Minimal polynomials derived 
from the GF(2 4 ) field are given in table 5.3. 


TABLE 5.3. — MINIMAL POLYNOMIALS 
OF ELEMENTS IN GF(2 4 ) 

4 

[Generated by p(x) = 1 + jc + x .] 


Conjugate roots 

Minimal polynomial 

0 

X 

a a 2 a 4 a 8 

x + \ 

a 3 a 6 a 9 a 12 

x 4 + x + 1 

a 5 « 10 

419 
X + A + X +* + 1 

a 7 a 11 a 13 a 14 

x 4 + x* + 1 


▲ 


5.2 Encoding and Decoding 

The simplest encoding/decoding scheme is best explained by a specific example; the one chosen is the 
example in Lin and Costello (1983). 

Example 5.5 

For a (7,4) Hamming code, choose the generator as 


G = 



'1 1 0 1 0 0 O' 
0 110 10 0 
1110 0 10 
1 0 1 0 0 0 1 


v = uG 


Here, the parity check digits are at the beginning of the code word. The circuit to encode a message vector 
u = («o> «i, " 3 ) is given in figure 5.1. The message register is filled by clocking in U 3 , 1 * 2 , u\, «o an< i 

simultaneously passing them to the output. Next, the modulo-2 adders form the outputs v = (vq, vj, V2) in the 
parity register. The switch at the right moves to extract y. The parity check matrix is 


H t = 


0 0 10 11 
10 1110 
0 10 111 


The coset leaders and corresponding syndromes are 


69 





Parity register 


Figure 5.1 . — Encoder for (7,4) Hamming code. 


Syndrome 1 

Coset leader 
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IT 
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1 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

1 

0 

0 

1 

1 

1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

1 


The circuit to perform the correction is shown in figure 5.2. The received bits are entered as r 0 , .... r 6 . The 
modulo-2 adders form the syndrome (so, s h s 2 ). A combinatorial logic network calculates the appropriate error 
pattern and the last row of adders serves to add e, to rj and correct the word, which is placed in the corrected 
output” buffer. If only a single error is present, only one e< is present, and the corresponding r, is corrected. 



▲ 
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5.2.1 Cyclic Codes and Encoders 

Many codes are cyclic (an end-around shift of a code word is also a code word; i.e., if 1001011 is a code 
word, then 1 100101 is the first end-around shift and is also a code word). Such codes can be represented by a 
generator polynomial g(;t). The recipe for an ( n,k ) code is 

v(*) = «(*)*(*) (5.1) 

where 

v(x) = v 0 + + v 2 x 2 + ... + 

m(x) = Uq + JC + U 2 X 2 + .. .+ u k _ \X k ~ l 
g(x) = 1 + g x X + g 2 x 2 +... + g n _ k _ l x n ~ k ~ 1 + x n ~ k 
A property of g(x) is that g( x) for an (n 9 k) code divides x? + 1; that is, 

(x n + l) = g(x)h(x) 

or 

x 1 +1 = {l + jc + jc 3 j(l + AT + j: 2 +x 4 j 

This factoring is not obvious and must be found by table look-up in general. Further factoring is also possible 
in this case: 


X 1 + 1 = (1 + x)(l + X + JC 3 )(l + X 2 + JC 3 ) 

where there are two g(x) factors, both of degree 3. Therefore, each generates a (7,4) code. Observe that the 
code word v(x) in equation (5.1) is not in systematic form but can be put in that form with the following 
procedure: 

1. Premultiply the message u(x) by x n ~^ c . 

2. Divide by g(x) to obtain a remainder b (jc). 

3. Form v(x) = x n ~ k u(x) + b(x). 

Example 5.6 

Encode m(x) = 1 101 -4 1 4 x + x 3 with g(x) = 1 + x + jc 3 in a (7,4) code. Form 

x 3 (l + x + x 3 ) = x 3 + x 4 + x 6 

Form 

x 3 + x 4 + x 6 
x 3 + x + l 
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The quotient is q(x) = x 3 with remainder b(x) = 0. Then, 


v(x) = x 3 + x 4 + x 6 = 0001 101 

Note that the last four digits (1 101) are the message and the first three (000) are the parity check digits. 


Example 5.7 . 

This problem shows the correspondence between the generator matrix and the generator polynomial tor 

cyclic codes. Consider a (7,4) cyclic code generated by 

g(x) = l + x + x 3 


Determine its generator matrix G in systematic form. The procedure is to divide x" k+ ‘ by g(x) for 
i = 0,l,2,...,fc-l. For i = 0, x" _ * =x 3 

x 3 +x + l[ x 3 |1 <-ff (x) 

x 3 + x ± 1 

x + 1 < — r(x) 

so that X 3 = q(x)g(x) + K*) => X 3 = 1 X g(x) + (1 + x) . For i = 1, 


After division, 

x 4 =xg(x) + (x + x 2 ) 

Continuing, 

x 5 = (l + x 2 )g(x) + (x 2 + x + 1) 
x 6 = (x 3 + x + l)g(x) + (l + x 2 ) 

Rearrange the above to obtain four code polynomials 

v 0 (x) = l + x + x 3 
Vj(x) = X + x 2 +x 4 
v 2 (x) = 1 + x + x 2 +x 5 
v 3 (x) = 1 + x 2 + x 6 

which are found by adding together the single term x (,) on the left with the remainder. That is, x 3 is added to 
(1 + x) to form vq(x). Use these as rows of a (7 x 4) matrix; thus, 
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'1 

1 

0 

1 

0 

0 

O' 

0 

1 

1 

0 

1 

0 

0 

1 

1 

1 

0 

0 

1 

0 

1 

0 

1 

0 

0 

0 

1 _ 


Note that g(x) can be read from G by observing the first row of G. The row 1 101000 corresponds to x°, x 1 , and 
x 3 so that 


g(x) = x° + x 1 + x 3 = 1 + x + x 3 


▲ 


5,2.2 Encoder Circuits Using Feedback 

Most encoders use feedback shift registers. Recall that the code word can be found in two ways, 

v(x) = u(x)g(x) 


or 


v{x) = x n ' k u(x) + b{x) 
where the generator polynomial has the form 

g(x) = 1 + g\X- + g 2 X 2 + ■ :+g„- k -lX n - k - 1 + X n ~ k 


Figure 5.3 gives the basic encoder scheme. The multiplication of the message vector by x n ~ k basically adds 
zeros onto the left of the message vector, which gives enough bits for n complete shifts of the register. The 
operation proceeds with switch I closed and switch II down. The machine shifts k times, loading the message 
into the registers. At this time, the message vector has moved out and comprises v(x), and at the same time the 
parity checks have been formed and reside in the registers. Next, switch I is opened, switch II moves up, and 
the remaining n — k shifts move the parity checks into y(x). During these shifts the leading zeros appended to 
u(x) earlier are shifted into the register, clearing it. 


Switch I 
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E &^de 5 the message vector u(x) = 101 1 into a (7,4) code by using the generator polynomial g(x) =l+x + x 3 : 

u(x) = 1011 = 1 + x 2 +ac 3 
x n ~ k u(x) = x i + x 5 + x 6 
x n ~ k u(x) = v(x) + b(x) = u(x)g(x) + b{x) 
b{x) = remainder mod g(x) of x n k u(x) = x^ + x + x 


For the (n - k ), three-stage encoding shift register shown in figure 5.4, the steps are as shown. After the fourth 
shift, switch I is opened, switch II is moved up, and the parity bits contained in the register are shifted to the 
output. The output code vector is v = 1001011, or in polynomial form, v(x) = 1 + x 5 + xf + r . 


Next, consider the syndrome calculation using a shift register. Recall that the syndrome was calculated by 
using modulo-2 adders in figure 5.2; a different method using registers is given in figure^- Here, the received 
vector is shifted in; and after it has been loaded, the syndrome occupies the register. The lower portion gives 
the syndrome calculator for the (7,4) code used in previous examples. Note that the generator matrix used for 
the case in figure 5.2 yields the same generator polynomial as shown in figure 5.5; thus, differen 
implementations of the same decoding scheme can be compared. 



Input 

queue 

Shift 

number 

Register 

contents 

Output 

0001011 

0 

000 

- 

000101 

1 

110 

1 

00010 

2 

101 

1 

0001 

3 

100 

0 

000 

4 

100 

1 

00 

5 

010 

0 

0 

6 

001 

0 


7 

000 

1 


Figure 5.4.— Cyclic encoder steps while encoding message 
vector t/(x) = 1011. 
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Switch I 




(7,4)gM = 1 +X + X 3 

Figure 5.5. — Decoder using shift register, (a) General syndrome 
calculator, (b) Calculator for specific (7,4) code given by gener- 
ator g(x) = 1 +x + X 3 . 


53 Decoders 

In syndrome decoding for general block codes and for the special case of cyclic codes, the difficult step of 
determining the error pattern e commences once the syndrome is known. Many algorithms have been 
developed for this stage of decoding; and their evolution and implementation form a large body of material in 
the journals. Each has its good/bad, cost/complexity tradeoffs, etc. According to Clark and Cain (1981) 
decoders are algebraic or nonalgebraic. Algebraic decoders solve simultaneous equations to find e\ also, finite- 
field Fourier transforms are sometimes used. Only hard-decision decoders are discussed here, since they find 
the most use. Soft-decision decoders (nonalgebraic, such as Massey’s APP (a posteriori probability), 
Hartmann-Rudolph, Weldon, partial syndrome, etc.) are omitted. The nonalgebraic decoders use properties of 
codes to find e, and in many instances a code and decoder are “made for each other.” Some schemes discussed 
here are also used with convolutional codes, as covered in chapters 6 and 7. 

The delineation of decoding algorithms is not crisp. For example, some authors use Meggit decoders as a 
classification with feedback decoding being a subset. Others, however, include Meggit decoders as a special 
form of feedback decoding. Following the lead of both Gark and Cain (1981) and of Lin and Costello (1983), 
the discussion of decoders begins with cyclic codes. 

53.1 Meggit Decoders 

The algorithm for Meggit decoders depends on the following properties of cyclic codes: 

1 . There is a unique one-to-one correspondence between each member in the set of all correctable errors 
and each member in the set of all syndromes. 

2. If the error pattern is shifted cyclically one place to the right, the new syndrome is obtained by advancing 
the feedback shift register containing S(x) one shift to the right. 

These properties imply that the set of error patterns can be divided into equivalence classes, where each class 
contains all cyclic shifts of a particular pattern. For a cyclic code of block length n, each class can be identified 
by advancing the syndrome register no more than n times and testing for a specific pattern after each shift. 
Figure 5.6 shows a basic form for a Meggit decoder that uses feedback (some forms do not use feedback). The 
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Figure 5.6.— Feedback Meggitt decoder. 


received vector is shifted into the storage buffer and syndrome calculator simultaneously. At the completion of 
the load step, a syndrome resides in the syndrome calculator. Next, the pattern detector tests the syndrome to 
see if it is one of the correctable error patterns with an error at the highest order position. If a correctable 
pattern is detected, a one appears at the pattern detector’s output; the received symbol in the rightmost stage of 
the storage buffer is assumed to be in error and is corrected by adding the one to it. If a zero appears at the 
pattern detector’s output, the received symbol at the rightmost stage is assumed to be correct, and no correction 
is needed (adding a zero does not change it). As the first received bit is read from the storage buffer (corrected 
if needed), the syndrome calculator is shifted once. The output of the pattern detector is also fed back to the 
syndrome calculator to modify the syndrome. This effectively “removes” the effect of this error on the 
syndrome and results in a new syndrome corresponding to the altered received vector shifted one place to the 
right This process repeats, with each received symbol being corrected sequentially. This basic idea has many 
variations and many differences in the number of times the received vector is shifted versus the number of 
times the syndrome calculator can change. Also, the phase of shifts can vary. In this manner, bursts of errors 
are handled as well as shortened cyclic codes. The Meggit decoder for the (7,4) code is shown in figure 5.7. 



(7.4) gM = 1 + x +X 3 

Figure 5.7.— Meggitt decoder for specific (7,4) cyclic code. 
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53.2 Error-Trapping Decoders 

Error-trapping decoders are a subset of Meggit decoders, and several forms and enhancements on the basic 
concept exist (e.g., Kasami’s method). They work because of the following property: If errors are confined to 
the n - k high-order positions of the received polynomial r(x), the error pattern e{x) is identical to x*S^ n_ ^(;t), 
where n ~*\x ) is the syndrome of r^ n- **(x), the (n - £)th cyclic shift of r(x). When this event occurs, it 
computes and adds to r(jc). In other words, the scheme searches segments of _r£x) in hopes 

of finding a segment that contains all the errors (error trapping). If the number of errors in r(x) is t or less and 
if they are confined to n-k consecutive positions, the errors are trapped in the syndrome calculator only when 
the weight of the syndrome in the calculator is t or less. The weight of Six) is tested by a (n - fc)-input threshold 
gate whose output is one when t or fewer of its inputs are one. Its inputs come from the syndrome calculator. 

533 Information Set Decoders 

Information set decoders work on a large class of codes (hard or soft decision). In an (n,k) group code, an 
information set is defined to be any set of k positions in the code word that can be specified independently. The 
remaining n-k positions are referred to as the “parity set” If the generator matrix for the code can be written 
in echelon canonical form, the first k positions form an information set. Any other set of positions can form an 
information set if it is possible to make the corresponding columns of the generator matrix into unit weight 
columns through elementary row operations. For example, consider the (7,4) Hamming code whose generator 
is 



'1 

0 

0 

0 

1 

1 

o' 



0 

1 

0 

0 

0 

1 

1 


G = 










0 

0 

1 

0 

1 

1 

1 



0 

0 

0 

1 

1 

0 

1_ 


By adding the first row to the third and fourth rows, 

this matrix can be transformed to 


"1 

0 

0 

0 

1 

1 

o' 



0 

1 

0 

0 

0 

1 

1 


G' = 










1 

0 

1 

0 

0 

1 

1 



1 

0 

0 

1 

0 

0 

1 



This has the effect of “interchanging” columns 1 and 5. Positions 2, 3, 4, and 5 now form an information set 
(have only a single one in their columns). This example shows that a necessary and sufficient condition for 
being able to “interchange” any arbitrary column with one of the unit weight columns is that they both have a 
one in the same row. By this criterion, column 1 can be interchanged with column 5 or 6 but not with 
column 7, column 2 can be interchanged with column 6 or 7 but not with column 5, etc. Since the symbols 
contained in the information set can be specified independently, they uniquely define a code word. If there are 
no errors in these positions, the remaining symbols in the transmitted code word can be reconstructed. This 
property provides the basis for all information set algorithms. A general algorithm is as follows: 

1 . Select several different information sets according to some rule. 

2. Construct a code word for each set by assuming that the symbols in the information set are correct. 

3. Compare each hypothesized code word with the actual received sequence and select the code word that 
is closest (smallest metric, closest in Hamming distance). 

53.4 Threshold Decoders 

Threshold decoders are similar to Meggit decoders but need certain code structures. Majority-logic decoding 
is a form of threshold decoding for hard-decision cases and has been used often. (It is seldom used now.) 
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Threshold decoding uses circuitry to work on the syndrome to produce a likely estimate of some selected error 
digit. The main point is that any syndrome digit, being a Unear combination of error digits, represents a known 
sum of error digits. Further, any Unear combination of syndrome digits is thus also a known sum of error digi . 
Hence all 2'- r such possible combinations of syndrome digits are all of the known sums of error digits 
available at the receiver. Such a sum is called a parity check equation and denoted by A, (the ith parity check 
equation) Thus, each A, is a syndrome digit or a known sum of syndrome digits. A parity check equation A, is 
Sd to check an error digit 4 if 4 appears in Ay. A set { A, } of parity check equations is said to be m^gomd 
on €ffi if each A, checks e m but no other error digits are checked by more than one A,. For example, the 
following set is orthogonal on 63 (all additions are modulo- 2 ). 


Aj = e x ® e 2 ® e 3 

A 2 = e 3 ©e 4 0e 5 

A 3 = e 3 ©e 6 ®e 7 


Although e 3 appears in each A„ each of the other error digits appears m only a single A,. Majority-logic 
decoding is a technique of solving for a specific error digit given an orthogonal set of parity check equations 
for that error digit and is characterized by the following: Given a set of J = 2r + S parity checks orthogona on 
c , any pattern of r or fewer errors in the digits checked by the set {A,} will cause no decoding error (i.e., is 
correctable) and patterns of t+ 1 + J errors are detectable if e m is decoded by the rule 


e m = 1 if more than(7 + S)/2 of the A,- have value = 1 
e m =0 if (7 - S)/2 or fewer have values = 1 
error detection only if otherwise 


Here, i m denotes the estimate of e m . Thus, J + 1 corresponds to the effective minimum distance for majonty- 
logic decoding. Further, it can be shown that the code must have a minimum distance of at least J+ 1- A cod 
is completely orthogonalized if d mn - 1 orthogonal parity check equattons can be found for each error d g . 

o n “algebraically defined- coder, such an BCH coder. Tbe rdgetaaic rdocnjre 
imposed on the codes permits computationally efficient decoding algorithms. First, the underlying structure 
of these BCH codes must be studied. A primitive BCH code has 


n = 2 m - 1 , n-k<mt, t < 2 


m-1 


m> 3 


^min — + 1 


The generator polynomial is of the form 


g(x) = rn^x) m 3 (x) m 5 (x) m 2 ,-\{x) 


(i.e., t factors). 

Write the parity check matrix in the form (for n = 15) 
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n,k = (15,1 1) 


H = 



a 2 *** &is 

3 3 

a 2 --a [5 


The {<2;}, i = 1,...,15 are distinct nonzero elements of GF(2 1 2 3 4 ). If errors occur in positions i and j of the received 
word, the syndrome 


S = eH = (^,$2^3,54) 


produces two equations in two unknowns 


a x + cij = Si 

and 


3 3 

di + dj = ^3 

If these equations could be solved for a, and dj, the error locations i and j would be known. Error correction 
would then consist of inverting the received symbols in these locations. Because the equations are nonlinear, 
any method of solving them directly is not obvious. However, it is possible to begin by eliminating one of the 
variables. Thus, solving the first equation for a, and substituting into the second equation yields 


aj + S\dj + sf + — = 0 
•*1 


Had the first equation been solved for dj, the resulting equation would be the same, with <2/ replacing dj . 
Consequently, both d x and dj are solutions (or roots) of the same polynomial: 

a = z 2 + Siz + sf + — = 0 


This polynomial is called an error locator polynomial. One method of finding its roots is simple trial and error. 
Substituting each of the nonzero elements from GF(2 4 ) into this equation guarantees that the location of both 
errors will be found. The complete recipe for decoding is as follows: 

1. From r(x) calculate remainders modulo m\, m3, and m$\ these result in partial syndromes 5/. For a r-error- 
correcting code, there are 2 1 such m-bit syndromes. 

2. From the s;, find the coefficients for an e-degree error locator polynomial ( e < t), where e is the number 

of errors. The technique for doing this is called the Berlekamp iterative algorithm. This polynomial o(x) has the 
significance that its roots give the location of the errors in the block. The roots are the error location numbers 
ct, i = 0 14 (if n = 15). 

3. Find the roots, generally by using the Chien search, which involves checking each of the n code symbol 
locations to see if that location corresponds to a root. 

4. Correct the errors. For binary codes, this entails just complementing the erroneous bit. For Reed- 
Solomon codes (nonbinary), a formula for correcting the symbol exists. 
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5.4 Miscellaneous Block Code Results 

5.4.1 Reed-Solomon Codes 

Reed-Solomon (R-S) codes use the following procedure: 

1. Choose nonbinary symbols from GF(2 m ). Each symbol has m bits (i.e., let m = 8, a symbol is (10101010) 

or eight bits). 

2. Define q = 2 m . Then, 


N = q - 1 
N - K = 2t 


symbols / word 
to correct t symbols 


^min 


= 2f + l 


since d m ^ = N - K + 1, it is maximum-distance separable (largest possible d m in). On a bit basis, 


N -» n = 2 m - 1 j bits 
N -K m(N - A') check bits 

which is cyclic (subset of BCH) and good for bursts. 

3. Use Berlekamp-Massey or Euclidean decoders, which can correct 

1 burst of length b\ (t — \)m + 1 bits 

2 bursts of length (t — 3)m + 3 bits 

i bursts of length (/ - 2i + l)m + 2z - 1 bits 


4. Let 6 be the maximum correctable burst length (guaranteed), and let £ be the length of the shortest burst 
in a code word (lxxxxxl): 


b 


< 


l-l 

2 


For example, for (N,K) = (15,9) 

t = 3, m = 4, <4 = 7 

If the code is viewed as a binary (60,36), R-S codes can correct any burst of three four-bit symbols, where a 
is in GF(2 m ): 
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g(x ) = (x + a)(x + a 2 )(* + a 3 j • • • (x + a d 1 j 


g (x) = (x + a) • • • | x + a 6 j for above example 


. g(x) = x 6 + a l0 x 5 + a ,4 x 4 + a 4 x 3 + a 6 x 2 + a 9 x + a 6 


5. To calculate the error probability, let 


K 




From p Cy determine the channel symbol error rate p s ym bo] 

p =l-(l -p c ) n 

F symbol v Hc) 

Let p u (E) be the probability of undetected error (symbol error): 


N 

Pu ( E) = A Psymbol “ Psy mbol ) 

i=l 




Aq=1 Aj = 0 1 < j < N — K 


A j=(j) X fov(N-K) + l<j<N 

h-0 


The probability of decoding error (symbol error) is 


N 

p{E) — | i jpsymbol ( ^ ““ Psy mbol ) 


i=t+l 


The total symbol error probability is 


jV 

Plol =P u (E) + p(E)=p(E)±± J,j(j)pl ymh0 \( l -P^) N ' J 


j = t+ * 


Now, to find bit error rate, 
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(Pb) s 

Mu* 


1 M 

2 A/-1 


for M-ary multiple -frequency shift keying(MFSK) 


or 


or 


/ V ^ L5t + l 


(Pb> s - 2 X ^symbol) 


j=t+l 


5.4.2 Burst-Error Correcting Codes 

Burst-error-correcting codes include the following types: 

1 . Burst detecting and efficiency correcting, 


*?■ 


lb 

n — k 


2. Fire codes, g(r) = (* c - l)p(*), where p has degree m. 


c - ^ndn + b-l 


m> b 

where b is burst length and the code corrects all bursts < b and detects all bursts < ^nun bits long. In general, 


b< 


n-k 

2 


n-k>b — 1 + log 2 n 


n - k > 2{b - 1) + log 2 (n - 2b + 2) 


Detecting a burst of length b requires b parity bits, and correcting a burst of length b requires 2b parity bits. 

A common application of cyclic codes is for error detection. Such a code is called a cyclic redundancy 
check (CRC) code. Since virtually all error-detecting codes in practice are of the CRC type, only this class of 
code is discussed. A CRC error burst of length b in the n-bit received code word is defined as a contiguous 
sequence or an end-around-shifted version of a contiguous sequence of b bits, in which the first and last bits 
and any number of intermediate bits are received in error. The binary (njc) CRC codes can detect the following 
n-bit channel-error patterns: 

1 . All CRC error bursts of length n-k or less 

2. A fraction 1 - 2~ (n ~ k ~ X) of the CRC error burst of length b = n-k+ 1 

3. A fraction 1 - 2 -< " _fe) of the CRC error bursts of length b>n-k+ 1 
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4. All combinations of d m j n - 1 or fewer errors 

5. All error patterns with an odd number of errors if the generator polynomial has an even number of 
nonzero coefficients 

Usually, the basic cyclic codes used for error detection are selected to have a very large block length n . Then, 
this basic code, in a systematic form, is shortened and is no longer cyclic. All standard CRC codes use this 
approach, so that the same generator polynomial applies to all the block lengths of interest. Three standard 
CRC codes are commonly used: 

1. CRC-12 code with g(jc) = 1 + x + x 2 +x 3 + x 11 +x 12 

2. CRC- 16 code with g( x ) = 1 + x 2 + jc 15 + * 16 

3. International Telegraph and Telephone Consultative Committee (CCITT) CRC code with 


g(x) = l + x 5 +x 12 +* i6 

4. A more powerful code with 

g(x) = l + x + x 1 + JC 4 +x 5 + X 1 +Jt 8 +* 10 + * 11 + x 12 +;t 16 + x 22 +x 23 + x 26 + x 32 

has been proposed where extra detection capability is needed. 

5.4.3 Golay Code 

The weight enumerator for Golay code (23,12) is 

A(z) = 1 + 253 z 1 + 5062 s + 1288z u +1288z 12 + 506z 15 + 253z 16 + z 23 
Code (23,12) has d = 7 and t = 3 and corrects up to three errors. 


(r 23 + 1 j = (1 + x)g, (x)g2 (x) 

£[ (x) = 1 + x 2 + x A + x 5 + x 6 + x 10 + x 1 1 
g 2 (x) = 1 + X + x 5 + x 6 + x 1 + x 9 + jc" a m 69 (x) 


Recall that 


2 k < 


2 ” 


1 + n + (5) + - + (") 


For n = 23, and t = 3, 


(f) + (f) + (f) + . = 204S 
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but 2048 = 2 1 1 , 


2 



.\* = 12 


Thus, 2 12 code words equals 4096 spheres of Hamming radius 3, closely packed. Each sphere contains 2 
vectors. There are 2 n ~ k = 2 1 1 syndromes, which correspond one to one to all error patterns. Adding the overall 
check bit gives code (24, 12) (then r = 1/2), which detects all patterns of up to four errors. The extended code 
(24,12) has dmin = 8- Using the decoding table concept shows that exactly n patterns differ from the correct 
pattern in one position, (JJ patterns differ in two positions, etc. Since there are almost always some patterns 
left over (after assigning all those that differ in t or fewer places). 


A e >l + n + (3) + - + (?) 

where N e is the number of n-tuples in a column. Since there are 2” possible sequences, the number of code 
words N c obeys 


N c <- 


2 " 


l + n + 


( 5 )— (?) 


(sphere packing bound). For an ( njc ) code, N c — 2*; thus, 

2 ' s 1+ „ + (») + ... + ( ? ) 

Golay noted that n = 23, k = 12, and t = 3 provide the equality in the above — thus, the “perfect” (23,12) code. 


5.4.4 Other Codes 

The following is some miscellaneous information about codes: 

1. Hamming codes are single-error-correcting BCH (cyclic). 

2. For t - 1 codes, 

k < n — log 2 (« + l) 

r = — <1 — — log 2 (n + l), 2"-2*(n + l) 

n n 

3. A multidimensional code uses a matrix for a code word. 

4. An alternative to the (n,k) notation uses M(n,d), where d is 

5. A rectangular or product code produces checks on both the columns and rows of a matrix that is loaded 
with the message. That is, 
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m^-mj □ row check 

□ 


m \ 
m j + 1 


m k + 1 

column 


6. Hardware cost is proportional to (rc)-(?). 

7. If searching for a code to apply to a system, see page 124 of Peterson and Weldon (1972) (i.e., given 
required n, k , and d m j n , is a code available?). 

5.4.5 Examples 

Example 5.9 

The probability of one code word being transformed to another code word is 



▲ 


Example 5.10 

Reed-Muller codes are specified by n - 2 m . 

*->+( 7 )+-+(?). "-** 1 + ( 7 ) + - + («£- 1 ) 


d- 2 m ~ r 


Example 5.11 

Maximum-length shift register codes (MLSR) are defined by 


A 


(n, k) ~ - l,mj m = 1,2,3,*** 

They are duals of Hamming (2 m - 1, 2 m - 1 - m) codes. All code words have same weight of 2 m_1 (except the 
all-zero word). The distance is d m \ n = 2 m “ 1 . To encode, load the message and shift the register to the left 
2 m -l times. 



▲ 
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Example 5.12 . , 

Soft-decision decoders use the Euclidean distance between the received vector and permissible code vectors^ 

For example, suppose three successive waveform voltages from the demodulator are -0. 1, 0.2, and 0.99 (a ar 
decision about zero would yield (01 1) as the word). Let each of these voltages be denoted by >'„ and assume 
that some predetermined voltage levels in the decoder have been assigned x,. The Euclidean distance between 
signal levels is defined as 

2 to - **) 2 

/= l 

In soft-decision decoding, this distance measure is used to find the closest code word. 


Example 5.13 , 

In general, n < n - k + 1 (Singleton bound). The equality implies a maximum-distance separable code 

R— S codes are such codes. Some upper and lower bounds on d m j n exist (fig. 5.8). Some formulas are 

1 . Gilbert- Varsharmov — For a q-ary code 

i=0 **0 


2. Plotkin 

k<n- 2 d^n + 2 + log 2 ^min 

3. Griesmer — Let \d ~ | represent the integer that is not less than d/2. 


i n 



0 .2 .4 .6 .8 1 .0 


r = kfn 

Figure 5.8.— Some classic upper and lower bounds on d min for 
(n, fc) block codes. 
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4. Hamming 


n > 


y'M 

h j 


/=o 

▲ 


Example 5.14 

The distance between three words obeys the triangle equality 


d(x,y)+d(y,z)>d(x,z) 


(a) 


Observe that 


W(x®z)<W(x)+W(z) 

which follows from the definition of weight and modulo-2 addition. Assume that 


(b) 


x = z®y = y © 4 


Then, 


z = y ® Jt = x © z 


Use these in equation (b) to give 


W{ x © z) < W(y © z ) + W(x © y) 


or 


d(x, z)^d{y t z) + d{x,y) 


since 


d(A , B) A W(A © B) 


A 


Example 5.15 

The structure for codes developed over GF(2 m ) is as follows: For example, let m - 4 and GF(16). The 
elements are 
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0 ( 0000 ) 

1 ( 1000 ) 

a : 


Let the string of input information bits be represented by x’s 

xxxx xxxx xxxx ... 

v a * *** 

First, divide the string into four-bit blocks where each block is a symbol or element from GF(16), as shown 
above. Next, clock the symbols into the encoder and output coded symbols. 


Example 5.16 

To find a code, use appendixes in Clark and Cain (1981), Lin and Costello (1983), and Peterson and Weldon 
(1972). The tables are given in octal. 


Octal 

Binary 

0 

000 

1 

001 

2 

010 

3 

Oil 

4 

100 

5 

101 

6 

110 

7 

111 


For example, octal 3525 means 011101 010 101, which corresponds to the generator polynomial 

g(x) = x 10 + x 9 + x 8 +x 6 +x 4 + x 2 +1 

Also, 23 corresponds to 010011 — » x 4 + x + l,or 

g(x) = 1 + x + x 4 

which is an (rx,k) = (15,1 1) code. 
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Chapter 6 

Convolutional Coding 

Convolutional encoding is more complex than block coding. Its explanation is somewhat involved, 
since notation and terminology are not standard in the literature. Convolutional codes are “tree” or 
“recurrent” in that some checks depend on previous checks. Following Lin and Costello (1983), a code 
is denoted by (n,£,m), where k inputs produce n outputs and m is the memory order of the encoder. If the 
encoder has a single shift register, m is its number of delay elements. For the encoder in figure 6.1, 
m = 3. For each bit entering, the commutator rotates and outputs two bits; thus, the code is denoted as 
(2,1,3). First, the impulse responses of the encoder are defined to be the two output sequences and 
v^ 2) when u =(1 000 ...), that is, a one followed by an infinite string of zeros. The shift register is 
loaded with zeros before applying the input. Observe that four nodes feed the output modulo-2 adders, 
and thus the impulse response contains four bits. By placing a one at the input node (the three delay 
elements are still loaded with zeros), = 1 and v (2) = 1. 

After moving the one through the register, 



(6.1) 

v <2) ^inA/2) 

(6.2) 


where and are the impulse responses for this encoder. They are also called generator sequences, 
connection vectors, or connection pictorials. The encoding equations become 


7 (1) =w*g (i) 


v^=u*g™ 


(6.3) 

(6.4) 


where * represents convolution in discrete modulo-2 fashion. For the general case, let 


»2>) 

(6.5) 

i®=(*o > .*P’.*? > . «£>) 

(6.6) 


(6.7) 


where v is the interlacing of and v* 2 *; then, a compact encoding equation is 
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Figure 6.1 .-One im Cementation of (2,1 ,3) convolutional encoder. 


where 




v =« G 


' 0U2) 
8o So 


sW ■ - 

. »M’ 0 0 

00 


S, ( V ■ ■ 



00 

00 

sVW ■ ■ 




is the generator matrix (of infinite extent). 

Example 6.1 
For 


* (1) 

= 1011, 


* (2) 

= 11 

11 

11 

01 

11 

11 

00 

00 

00 

00 

11 

01 

11 

11 

00 

00 

00 

00 

11 

01 

11 

11 

00 

00 

00 

00 

11 

01 

11 

11 


For u to five places (i.e., « =(10111)), 
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'll 

01 

11 

11 

00 

00 

00 

00 

00 

11 

01 

11 

11 

00 

00 

00 

00 

00 

11 

01 

11 

11 

00 

00 

00 

00 

00 

11 

01 

11 

11 

00 

00 

00 

00 

00 

11 

01 

11 

11 


The previous encoder can be redrawn in other ways, and this allows different means of describing the 
encoding procedure. In figure 6.2, the encoder has been redrawn by using a four-stage shift register; but 
observe that the first cell receives the first digit of u on the first shift. In the previous representation, the first 
output occurred when the first bit was at node 1 (to the left of the first cell). Another set of connection vectors 
Gj can be defined for this encoder: 


G { = 11, G 2 = 01, G 3 =ll, G 4 = 1 1 (6.10) 

where the subscripts refer to the register delay cells. The number of digits in each vector is equal to the number 
of modulo-2 adders. Let G + be a generator matrix and again let u have five places; then, 

G + — {g x G 2 G 3 G 4 j 


or 


11 01 

11 

11 




11 

01 

11 

11 


0 

0 

11 

01 

11 

11 



11 

01 

11 

11 




11 

01 

11 


( 6 . 11 ) 



Figure 6.2.— Alternative encoder circuit of (2,1 ,3) convolutional 
encoder of figure 6.1 . 
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Figure 6.3.— Third representation of (2,1 ,3) convolutional encoder 
of figure 6.1. 


which is just G in the previous example. Another representation of the encoder is given in figure 6.3. Here, the 
machine is at'its fourth shift, so that from equation (6.7) the output is >v\ >. From either the example or 
equation (6.11) for u =(10111), 


v =(11,01,00,01,01,01,00,11) 


( 6 . 12 ) 


In figure 6.3 (as « 4 enters) 

0 m 3^2 ® m 2^3 ® m 1^4 = 1 ' ( 1 1 ) ® 1 ' ( 01 ) ® 0 • ( 1 1 ) ® 1 ■ ( 1 1 ) 

= 11001® 00® 11 = 10® 11 = 01 

which is indeed the value in equation (6. 12) ( the fourth pair). Thus, the fourth pair of outputs depends on « 4 , 
u 3 , « 2 , and the memory (« 3 , u 2 , uj), or the “convolution.” Note that the last representation does not use a 

commutator. 

Here, the same encoder has been described with three different circuit representations and two different sets 
of “connection vectors.” This multiplicity of representations and terminology can cause some confusion if the 
reader is not careful when scanning the literature. 


6.1 Constraint Length 

Several definitions for the term “constraint length” can be found in the literature. The reasons for this 
confusing state of affairs will become evident as the discussion progresses. One reason is the variability in 
encoder design. For the simple case of a one-bit-in, three-bit-out encoder (fig. 6.4), the output commutator 
representation means that three output bits are generated for each input bit. Therefore, the code has r = 1/3 or 
(n,Jt) = (3,1). Each block of n output bits depends on the present input bit (which resides in the first cell of the 
shift register), as well as on two previous input bits. Loosely, the encoder’s memory is 2, which is both the 
number of previous input bits and the number of shifts by which a given bit can influence the output (do not 
count the shift when the bit first enters the shift register). The number of modulo-2 adders is three; in general, 
let v represent the number of such adders. Thus, here v = n. Each input bit affects three consecutive three-bit 
output blocks. So what is the “memory” of such an encoder? The various definitions of constraint length are 
variations on the notion of memory. The previous circuit can be redrawn as shown in figure 6.5 (upper part). 
Unfortunately, this encoder can also be drawn as shown in the lower part of the figure. The difference is the 
decision of placing the present input bit into a shift register stage or not. Therefore, how many shift register 
stages are needed for this particular ( njc ) coding scheme? 

Another encoder (fig. 6.6) has two input bits and three output bits per cycle; thus, («,*) = (3,2). Finally, in 
the case shown in figure 6.7, where k = 3 and n = 4, if the three input rails are considered to be inputs to shift 
registers; there is a zero-, a one-, and a two-stage register. In the case shown in figure 6.8, where k = 2 and 
n = 3, the output commutator rotates after two input bits enter. Two other variations (fig. 6.9) show some 
modulo-2 adders that deliver outputs to other adders. 
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1 





Figure 6.8. — Encoder where each register holds two input digits. 
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Figure 6.9. — Alternative encoder schemes wherein some 
modulo-2 adders feed other adders. 


With these variations for encoder construction a “memory” is somewhat hard to define. Consider a variation 
on figure 6.5 depicted in figure 6.10. Here, each “delay element” consists of k stages and the input commutator 
would wait at each tap until k bits entered the machine. After loading the third, or last, tap the output 
commutator would sweep the remaining three outputs. For simplicity, assume that each “delay element” holds 
only one bit; then, each shift register consists of AT, single-bit elements. Here Kq = 0, K\ = 1, and Kj = 2. The 
fact that the subscript equals the number of delay elements in this case is just an accident. (Figure 6.1 1 gives 
some situations where the notation can be confusing.) 

With this background, the following definitions can be stated: 

1 . Let K( be the length (in one-bit stages) of the ith shift register. Let k be the number of input taps; then, 

m A max K , memory order 

= i<i<* 1 



Figure 6.1 0. — Encoder wherein k-b\X registers are employed 
(variation on circuit in fig. 6.5). 
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Figure 6.1 1 . — Examples of encoders where con- 
straint length, memory order, and number of shift 
registers can be confused, (a) (2,1 ,3); m = K-j = 

K = 3. (b) (3,2,1); K, «1,K 2 « 1*m ■ ^ =K 2 = 

1, K = 2. 

k 

KA^Ki total encoder memory 
i* i 

2. Following Lin and Costello (1983), k is the number of input taps and n is the number of output taps. 

Specify a code by (n,k,m). Then, 


CL A n A = n(m + 1) 

which says the constraint length (CL) is the maximum number of output bits that can be affected by a single 
input bit. This word definition is most often what is meant by constraint length. However, a slew of other terms 
is used. Sometimes, m is called the number of state bits; then, 

memory span A m + k 

Often the memory span is called the CL. Sometimes, m is called the CL. Sometimes, n A above is called the 
constraint span. In many situations, the CL is associated with the shift registers in different ways. For example, 
in figure 6.12, the K = 4 means the total encoder memory; whereas K = 2 is the number of k - bit shift registers. 

3. Finally, the code rate needs to be clarified. A convolutional encoder generates n encoded bits for each k 
information bits, and r = kin is called the code rate. Note, however, that for an information sequence of finite 
length k L. the corresponding code word has length n(I + m), where the final n ■ m outputs are generated after 
the last nonzero information block has entered the encoder. In other words, an information sequence is 
terminated with all-zero blocks in order to allow the encoder memory to clear. The block code rate is given by 
kUn(L + m), the ratio of the number of information bits to the length of the code word. If L » m, then 
U(L + m) = l.and the block code rate and the convolutional rate are approximately equal. If L were small, the 
ratio kUn(L + m), which is the effective rate of information transmission, would be reduced below the code 
rate by a fractional amount 
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Figure 6.1 2. — Encoder where two two-bit registers are used A 
and corresponding notational ambiguity, k = 2, m = 3, CL = K r 
K = 4 or K = 2. 



Figure 6.13. — Popular r = 1/2, K * 7 = (m + fc) convolutional encoder. 


m 

I + m 

called the fractional rate loss. The nm blocks of zeros following the last information block are called the tail or 
flush bits. 

4. Quite often, the memory span (m + k) is designated as K y the constraint length. For example, a very 
popular r = 1/2, K-l encoder (fig. 6.13) means (n = 2, k = 1). Here, the CL refers to the number of input 
bits (m + k = 6 + 1), or the memory span. 


6.2 Other Representations 

With the convolutional term, constraint length, and other ideas covered, the many alternative representations 
for encoders can now be discussed. The example below summarizes the previous notions. 

Example 6.2 

Consider an encoder with delay cells /?, consisting of k subcells each (fig. 6.14). Often, the constraint length 
is the number of delay cells /. Here, every n (equal to v) outputs depends on the present k (those in cell 1) and 
(K— 1) previous /:-tuples. Then, the code can be described as ( n y k,K ), and the constraint span is (K/k) v. 
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Figure 6.14. — General convolutional encoder (upper) and 
specific K = 3, v = 2 example. 


For simplicity, assume one-bit cells; then, an encoder could be as shown in the lower portion of figure 6.14. 
Write the output 3-tuple at a particular shift as (vj, V 2 , V 3 ). Then, 


v, =/?l 


Vj = R\ 0 ® ^3 


V 3 = R\ 0 R$ 


Let the input stream be u ~ uq,...,u a , ug, and assume that uq is shifted into R\. Then, R 2 contains tig and 
/?3 contains u A and 


V 1 = u c 


v 2 “ U C ® ® ^ A 


v 3 = w c 0 u A 


The next representation uses a delay operator D as follows: Define 

g "\D) = g$U g UD +g ?D 2 + -+ g "'>D’" 

g { 2 ) (D) = g< 2 > + g\»D + g?D 2 + • • • + g^D m 


▲ 
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For the connection vectors of the previous sections, this means that 


g (1) = (101 1) => g w (D) = 1 + D 1 +D i 
g (2) =(1111)=>£ (2) (£>)=1 + D+D 2 + D 3 


Define 


g{D) = g m {D 2 ) + Dg (2) (D 1 ) 


Then, 


for 


v(D)Au[D m )g(D) 


u = (101 1 1) => 1 + £) 2 + £> 3 + Z) 4 


Then, 


v(D) = u[p m )[(l + D 4 + D 6 ) + Z>(l + D 2 + D 4 + D 6 )] 
= 1 + D + D 3 + D 1 + D 9 + D n + D 14 + D 15 


after the multiplication and modulo-2 additions, where the exponents refer to the position in the sequence. 
Recall from equation (6.12) that 

v=(ll,01,00,01,01,01,00,ll) 


Therefore, the above expression in D notation gives 

v =(1101000101010011) 

This is again just the polynomial representation for a bit stream. 

The next representation is the “tree” for the encoder (fig. 6.15). The particular path taken through the tree is 
determined by the input sequence. If a zero appears, an upward branch is taken. If a one appears, a downward 
branch is taken. The output sequence for a given input sequence is read from the diagram as shown. For the 
input sequence 1011, the output is 11 01 00 10. 

The next representation is the state diagram. The state of the encoder is defined to be the bits in the shift 
register with the following association. For the (2,1,3) code developed earlier (fig. 6.1), the states are labeled 
by using the following rule: The set of states is denoted by 

S 0 ,Si,S2,...,5 2 *_i 


where the subscripts are the coefficients 
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Figure 6.1 5.— Encoder (a) and tree for encoder (b). 


where the integer i is expanded as 


i = b Q 2°+b l 2 l +b 2 2 Z +--- 


For this example, K = 3 (total encoder memory). Then, eight states are possible S 0 , Si, S 2 , S 3 , S 4 , S 5 , S 6 , and S 7 . 
The state notation and register contents correspond as follows: 


Binary AT-tuple 

State 

000 

So 

001 

s 4 

010 

s 2 

Oil 

S 6 

100 

Si 

101 

S 5 

110 

S 3 

111 

S 7 


The corresponding state diagram (fig. 6.16) has the following interpretation: If the encoder is in state 54, for 
example, a zero input causes an output of 1 1 and movement to state 5<> and a one input causes an output of 00 
and movement to state S \ . 

The trellis is a state diagram with a time axis, and that for the above state diagram is given in figure 6.17. 
Each column of nodes in the trellis represents the state of the register before any input. If a register has K 
stages, the first K - k bits in the register determine its state. Only K - k bits are needed, since the end k bits are 
dumped out as the next k input bits occur. The trellis has nodes in a column, and successive columns refer 
to successive commutation times. Branches connecting nodes indicate the change of register state as a 
particular input of k bits is shifted in and a commutation is performed. A branch must exist at each node for 


too 







1/10 


Figure 6.1 6.— State diagram for encoder given in figure 6.1 . 


— 0^— Input 
1 Output 



Figure 6.1 7.— Trellis diagram for state diagram of figure 6.1 6. 


each possible fc-bit input. If the register is in some initial state, not all nodes are possible until K-k bits have 
been shifted in and the register is free of its initial condition. After Lk input bits, a distance of L columns has 
been progressed into the trellis, producing Ln output symbols. The trellis for the (2,1,3) code under 
consideration has eight rows of nodes, corresponding to the eight states Each column of nodes 

represents a time shift (when a commutation occurs). The dashed and solid lines represent paths taken for a one 
or zero input. For example, the input sequence u — 1 1 1 (assuming that the register is in Sq state) takes three 
dashed paths and winds up at state S 7 . The outputs are labeled so that the output sequence is 1 1 1001. A block 
of zeros will sooner or later move the register back to state S 0 ; this is called flushing. For this code, three zeros 
are needed to flush (clear) the encoder (to return to state Sq from any other state). 


6.3 Properties and Structure of Convolutional Codes 

Recall that the codes have several modes of representation. The “algebraic” forms include connection 
pictorials, vectors, and polynomials; as well as generator matrices. The tree, state, and trellis diagrams are 
geometrical formulations. 

A rather academic point is that of a catastrophic encoder. Such an encoder can get hung up so that a long 
string of ones produces, for example, three output ones followed by all zeros. If the three leading ones are 
corrupted by the channel, the decoder can only assume that all zeros constitute the message, thus, a 
theoretically arbitrary long sequence of errors results. 
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x X 



( \ 

00 

Figure 6.18. — Example of catastrophic encoder. 


Example 6*3 

The encoder in figure 6.18 is a catastrophic encoder. Such machines can be easily recognized by noting the 
connection vectors. They will have a common multiple. Here, 


g (1) =110-»1 + D 
g (2> =1 01-»l + £> 2 


but 


1 + D 2 = (1 + £>)(! + D) = g 0) (l + D) 


Thus, 


* (2) 

7* =l+D 


and their common multiple is 1 + D. Next, consider a code where 

g<” = 1 + Z> 2 +D 3 


g (2) = i + d + z> 2 + d 3 
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Now, sV = 1 with a remainder of D. Since a remainder exists, no common factor is present; hence, the 
encoder is not catastrophic. In general, if the ratio equals D ^for l > 0, the code is not catastrophic. The state 
diagram reveals catastrophic behavior when a self-loop of zero weight (that from state d in fig. 6.18) exists. 
This zero-weight self-loop cannot be in either “end” state in the diagram (here, a or e). In this diagram, a and 
e represent the same state. Systematic codes are never catastrophic. 


A 


6.4 Distance Properties 

Let A and B be two code words of length i branches in a trellis. The Hamming distance is as before 

d H (A, B) = w(A@B) 

Define the /th-order column distance function d c (i) as the minimum dff between all pairs of code words of 
length i branches that differ in their first branch of the code tree. Another way of saying this is that d c (i) is the 
minimum-weight code word over the first (/ + 1) time units whose initial information block is nonzero. It 
depends on the first n(i + 1) columns of G (for ( n,k ) code); hence, the word “column” in the definition. Two 
special distances are defined in terms of the column distance function as follows: 

= d c {i = m) 
dftet ~ 4e(* °°) 


The minimum distance occurs when i = m, the memory order; whereas dfree is for arbitrarily long paths. 
Quite often, d m ^ = d free- The distance profile is the set of distances 

d = [d c ( 1). </ c (2). 4.(3),...] 

In general, these distances are found by searching the trellis. An optimum distance code has a d m i n that is 
greater than or equal to the d m [ n of any other code with the same constraint length (memory order). An 
optimum free distance code has a similar property with df^. 

The next measure is the determination of the weight distribution function A,. Here, A,- is the number of code 
words of weight i (the number of branches is not important here). This set {A/} is found from the state diagram 
as shown next. 

The error correction power in a block code sense would say 


t = 



but this is a rather coarse measure. Observe for future reference that a tree repeats itself after K branchings. In 
the trellis, there are 2*” 1 nodes for 2 K ~ X states. For a given register, the code structure depends on the taps. 
Nonsystematic codes have larger dfr but systematic ones are less prone to the accumulation of errors. 
The final topic for this chapter is the generating function T(x) for a code. It is defined as 


r(x)=£V 

i 

where A z - is the number of code words of length i. The function is derived by studying the state diagram for a 
specific code. Problem 10.5 of Lin and Costello (1983) is used to describe the procedure. The code is 
described as (3,1,2) with encoder diagram shown in figure 6.19. The connection vectors are 
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/ 1 >=(110), g (2) =(l0l), g (3) =(lll) 

From the encoder diagram the state diagram can be drawn as shown in the lower part of the figure. Next, the 
S 0 state is split into two parts as shown in figure 6.20, which constitutes the modified state diagram. Added to 
the branches are branch gain measures where i is the weight of the n encoded bits on that branch. The S 0 
state is separated to delineate paths that “reemerge” to that state after passing through several intermediate 
states. If a self-loop is attached to the S 0 state, it is dropped at this step. From the modified state diagram, the 
generating function can be determined by using signal flow graph procedures. The 5 0 states on the e an 
right are called the initial and final states of the graph, respectively. The terms needed are defined as 

1. Forward path-A path connecting the initial and final states that does not go through any state more than 
° nCe 

2. Path gain — The product of the branch gains along a path F; . . . 

3 Loop— A closed path starting at any state and returning to that state without going through any other state 

twice. A set of loops is “nontouching” if no state belongs to more than one loop in the set. Define 

4-1-Xq -Xw, + ... 

i j.k t.o.p 

where XC, is the sum of the loop gains, XC ; C * the product of the loop gains of two nontouching 

loops summed over all pairs of nontouching loops, £ p C t C ° C P is thc P roduct of the loo P gainS ° f thre * 
nontouching loops summed over all triples of nontouching loops, etc. Next, define A „ which is exactly 
like A but only for that portion of the graph not touching the ith forward path. That is all states along 
the ith forward path, together with all branches connected to these states, are removed from the grap 
when computing A,-. Mason’s formula for graphs gives the generating function as 




T(x) = - 
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(b) - Path 2 


Figure 6.20. — Modified state diagram (a) of figure 6.19(b) and 
path 2 and its subgraph (b). 


where the sum in the numerator is over all forward paths. First, calculate A , and this requires counting all 
loops. Figure 6.20(a) shows three loops, which are listed here along with their path gains. 


Loop 


1 

s 3S3 

H 

II 

2 


C 2 =X 4 

3 

Sj S 2 S3 

H 

II 

O' 


There is only one set of nontouching loops, {loop 1, loop 3}, and the product of their gains is C\C 3 = x 4 . Thus, 
A is found as 


A = 1 — (jc -h JC 4 +x 3 ) + x 4 = l-Jt -* 3 
Now, to find A 1, there are two forward paths 


Forward path 



1 

S0S1S3S2S0 

F } =x s 

2 

SoSiS 2 So 

F 2 =x 7 


where the gains are also found. Because path 1 touches all states, its subgraph contains no states; thus, 

4=1 

Because path 2 does not touch state S3, its subgraph is that shown in figure 6.20(b). Only one loop exists here, 
with gain = jc; thus, 


A 2 = 1 - x 
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Figure 6.21 .—Augmented state diagram of figure 6.20. 


Finally, the transfer function is 

t(x) = ^±^z A= X 1 — = x 7 + x 8 + x 9 + 2x 10 + 3-t 1 1 + 4x 12 + • • • 
V 1 — x — x 3 1-x-x 3 


with the following interpretation: The coefficients of x 7 , x 8 , and x 9 are all unity and have one code word each 
with weights 7, 8 , and 9. Continuing in this manner, two words with weight 10, three with weight 11, etc. 

Next, the augmented state diagram is made (fig. 6.21). Here, the branches are given added weighting. The 
exponent of y is the weight of the output code word, and each branch is given the factor z ■ Repeating the 
previous calculation gives 


Loop 

i 

1 

S3S3 

Ci = xyz 

2 

S1S3S2S1 

C 2 =x 4 y 2 z 3 

3 


C 3 =xV 


The pair of loops has gain C\ C 3 = x 4 y 7 z 3 ; thus, 

A = 1 - (xyz + x 4 y 2 z 3 + x 3 yz 2 ) + x 4 y 2 z 3 = 1 - xyz - x 3 yz 2 


The forward path 1 is F\ = x 8 y 2 z 4 : then, A \ - l.The forward path 2 is F 2 = x 7 y 1 z 3 ; then, A 2 = 1 - xyz. The 
generating function is therefore 


2>a 

T(x, y, z) = - J — - 


7 3 


> 2 . — = * 7 yz 3 + *V z 4 + x 9 y i z s + x 10 y 2 z 5 + x 10 y 4 z 6 + 2 x> ! 'y 3 z 6 
l-xyz-x 3 yz 2 


+x 1 Vz 7 +3x ,2 yV+x 12 yV+- 


with the following interpretation: The first term means that the code word with weight 7 has output sequence 
with weight 1 (y exponent) and length of 3 branches (z exponent). The other terms have similar interpretations. 
This completes the discussion of convolutional encoders. 
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Chapter 7 

Decoding of Convolutional Codes 

The decoding of convolutional codes can be divided into three basic types: Viterbi, sequential, and threshold. 
Viterbi and sequential are similar in that they search through the trellis or tree; whereas threshold can be used 
for both block and convolutional codes. Threshold is also associated with the terms “majority logic,” 
“feedback ” and “definite decoding Historically, sequential came first, but it is simpler to discuss Viterbi’s 
algorithm first. 

7.1 Viterbi’s Algorithm 

The idea in Viterbi’s algorithm is to select a string of received bits and compare them with all possible 
strings obtained by tracing all possible paths through the trellis. For a sufficiently long string and not many 
errors, it seems reasonable to assume that one path through the trellis should agree rather closely with the 
received string. In other words, the decoder has properly reproduced the sequence of states that the encoder 
performed. The few bits of disagreement are the channel-induced errors. Experience has shown that the correct, 
or most likely path, becomes evident after about five constraint lengths through the trellis. The scheme is 
therefore to compare and store all possible paths for a set number of steps through the trellis and then select the 
“survivor,” the most likely path. Some storage can be saved by closely studying the properties of paths through 
the trellis. To study these effects, a metric is defined as follows: Let v be the transmitted code word and r be 
the received sequence. For the DMC with channel transition probability r; | v,*), 


N-\ 

pi 7 1 i-n* i v >) 

i= 0 

v=(v 0 ,v„...,v Ar _ 1 ), r = (r 0 ,r 1 ,...,r w _ 1 ) 
Then, taking the log (to reduce to sums) gives 


N-i 

log p(r I v)= ^log/^r; I v,) 

;=o 

This is the log-likelihood function, and it is the “metric” associated with the path v. The notation of Lin and 
Costello (1983) uses 

M(r | v)A log p(r | v) 
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whereas others use T(r | v) = log p{r | v) . The metric for each segment along the path is 

M{n | Vj) = log fa | v,) 

and is called a “branch metric.” Thus, the path metric is the sum of all branches: 

N - 1 

M{ r | v)= | Vj) 

i=0 

A partial path metric for the first; branches of a path is 

7-1 

M ([ p I = I v «) 

i=0 

For the binary symmetric channel with transition probability p, 

log p(r | v) = log [p z (l - p) N z ] 


where z = d(r, v) is the Hamming distance between r and v.Thus, 


M(r \v) = N log(l -p)~z logji-^j —A-Bz 

where A and B are positive constants (p < 0.5). Therefore, minimizing the Hamming distance maximizes the 

The basis of Viterbi decoding is the following observation: If any two paths in the trellis merge to a single 
state, one of them can always be eliminated in the search for the optimum path. The path with the smaller net 
metric at this point can be dropped because of the Markov nature of the encoder states. That is, the present state 
summarizes the encoder history in the sense that previous states cannot affect future states or future output 
branches. If both paths have the same metric at the merging state, either one can be eliminated arbitrarily 
without altering the outcome. Thus, “ties” cause no problems. In Lin and Costello (1983), the metric is chosen 

as 


N - 1 

metric = ^ C 2 [log p{r t | v, ) + C t ] 

i=0 

to bias the metrics for ease of computation. The constants C\ and C 2 are chosen appropriately. 

The storage required at each step in the trellis is straightforward, although the notation is not Essentially, one 
of the two paths entering a node is stored as the “survivor.” The notation variation between authors is the 
indexing from either zero or one in the counting. In Lin and Costello (1983), there are 2 states at a step in the 
trellis; others use 2 K ~ X . Thus, the number of survivors is either 2 or 2 per level or step, within the trellis. 
If L is the constraint length, 2^ metrics are computed at each node, so that 2 metrics and surviving 
sequences must be stored. Each sequence is about 5kL bits long before the “final survivor” is selected. Thus 
Viterbi requires L < 10. The complexity goes as 2 K while the cost goes as 2 V , where v is the number of 
modulo-2 adders in the encoder. The scheme is good for hard- or soft-decision demodulators. If one starts and 
stops at the zero, or topmost, node of the trellis, the transient in getting into or out of the trellis proper is called 
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the input transient and output flush. A truncated algorithm does not use the tail or flush bits. Tail-biting 
preloads its trailing bits from the zero start node to enter the full trellis. It then starts decoding after the tail is 
loaded. 


7.2 Error Performance Bounds 

As for block codes, the probabilities of sequence (string or path) errors is first found for error performance 
bounds. Then, the bit errors are bounded by terms involving the raw channel transition probability. Recall for 
blocks that a syndrome indicated that a block was in error and then the bits were processed. Here, no flag 
(syndrome) is found; but the sequence closest in Hamming distance consistent with the possible paths through 
the trellis is chosen. Thus, the error counting is again not extremely crisp. If the generator function is computed 
(impractical in many cases), then for hard decisions (binary symmetric channel), 


Pblock ^string ~ ^(*)| 




Pbn^ 


1 ar{x, y ) 

k dy 


y=i 

x =^4p(l-p) 


For soft decisions, 


Pblock - 


\x~^4p(l-p) 


Pbit 


< i gM 

2k dy 


y=l 

x=e- rE » ,N ° 


In general, the coding gain is bounded by 


< gain < r d free 

The Viterbi algorithm is used for raw bit error rates in the range 1CT 4 to 10" 5 or better. The decision depth is 
the number of steps into the trellis before a decision is made. When T(x y y) is not feasible, use 

where B d is the total number of ones on all paths of weight df ree, that is, the number of dotted branches on 
all these pafhs. 

The Viterbi algorithm may be summarized as follows: 

1. At time r, find the path metric for each path entering each state by adding the path metric of the survivor 
at time t - 1 to the most recent branch metric. 
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2. For each state of time t, choose the path with the maximum metric as the survivor. 

3. Store the state sequence and the path metric for the survivor at each state. 

4. Increment the time t and repeat steps 1 to 3. 

It is necessary to truncate the survivors to some reasonable length, called the decision depth S. At time t, a 
decision is forced at time t - 8 by using some criterion, which may be found by trial and error in some cases. 
As time progresses, it may be necessary to renormalize the metrics. 


Example 7»1 

Consider the trellis diagram for a particular (3,1,2) code (fig. 7.1). The bold fine is a possible input path, and 
the corresponding input and output sequences u and v to the encoder are 

u = 10011010 

v = 111 101 Oil 111 010 110 100 101 

— 0 — Input 
1 Output 



Figure 7.1 .—Trellis diagram and arbitrary path through it. 


Suppose the received sequence is 


R = 111 101 111 111 000 110 101 101 

t T T 

where the errors are denoted by arrows. The decoding steps can be summarized as follows: To simplify 
matters, the Hamming distance between possible paths is used to select survivors. As stated earlier, real 
decoders accumulate a metric, but using the closest path in Hamming distance is equivalent (and easier) for 

example purposes. . 

Step 1 : Figure 7.2 shows the paths that are needed to get past the transient and enter the complete trellis. 

These four paths terminate at step 2 in the trellis, and their Hamming distances from the received path are 


Path 1 

input 00, output 000 000, 

H= 5 

Path 2 

input 01, output 000 111, 

H = 4 

Path 3 

input 10, output 111 101, 

H = 0 

Path 4 

input 11, output 111 010, 

H= 3 
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_0— Input 
1 Output 



Figure 7.2. — Set of paths required in input transient phase 
(before complete trellis is available). 


Step 2: Next, each path is extended from its node at step 2. For example, path 1 is extended to the 
column 3 nodes with inputs 0 or 1 and with outputs 000 and 111, respectively. Call these paths la and lb. Their 
Hamming distances from the received sequence are 


Path la # = 5 + 3 = 8 
Path lb #=5+0=5 

Since path lb is closer in Hamming distance, it is the “survivor.” The extensions of the other paths are as 
follows, where the “a” path is the uppermost one from a particular node: 


Path 2a #=4+1=5 
Path 2b #=4 + 2 = 6 
Path 3a # = 0+1 = 1 
Path 3b # = 0 + 2 = 2 
Path 4a #=3+1=4 
Path 4b #=3 + 2 = 5 


Therefore, the survivors are paths lb, 2a, 3a, and 4a (fig. 7.3). To simplify notation at this point, drop the letter 
designation on the paths and call them just 1, 2, 3, and 4. Now, extending the paths to nodes in column 4, 
where again the “a” and “b” extensions are used, gives 


Path la 

H= 5+1=6 

Path lb 

=5+2=7 

Path 2a 

H= 5+1=6 

Path 2b 

H= 5+2=7 

Path 3a 

H= 1 +3 = 4 


ill 


Path 3b tf = 1+0=1 


Path 4a #=4+1=5 
Path 4b # = 4 + 2 = 6 

1 Output 



Figure 7.3. — Survivors at column 3 in trellis. 


Then, the survivors are la, 2a, 3b, and 4a, with corresponding Hamming distances (fig. 7.4): 

Path la # = 6 
Path 2a # = 6 
Path 3b # = 1 
Path 4a # = 5 

0“^” Input 

1 Output 



Figure 7.4. — Survivors at column 4. 
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The next extension yields the survivors at column 5 (fig. 7.5): 


Path lb H~1 
Path 2a H = 6 
Path 3b H= 2 
Path 4a H = 5 


O — Input 
1 Output 

01 2345678 



Figure 7.5. — Survivors at column 5. 

The next extension gives the survivors at column 6 (fig. 7.6): 

Path lb H= 8 
Path 2b H = 1 
Path 3a H = 2 
Path 4b H= 6 


0 — Input 

1 Output 

01 2345678 



Figure 7.6. — Survivors at column 6. 
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The next step gives the survivors at column 7 (fig. 7.7): 

Path lb H = 9 
Path 2a H = 1 
Path 3b H = 3 
Path 4a H = 6 

—0 — ■ ■ Input 



Figure 7.7. — Survivors at column 7. 


Note that path 3b differs from the received sequence by only three places and is the best choice 
decoder to mike. The next closest path has weight 6, which is much farther away. If the decoder were to make 
the decision to now drop all contenders, correct decoding has occurred. 


Michelson and Levesque (1985) gives some tables of good codes for use with Viterbi’s algorithm. Their 
notation corresponds to that used earlier as 


(n, k, m ) (v, b, k ) 

Their notation is (bk) stages, b bits/shift between registers, b bits/shift input to the encoder v bits/shift out, rate 
blv, and constraint length k. Table 7.1 (our notation used) gives the constraint length (number ° ^ ± 
in encoder) and the tap connection vectors. Here b= 1, and binary signaling is assumed. Table 7.2 gives the 
very important derivative expressions for calculating the bit-error-rate curve. General derivative expressions 
were given earlier while discussing error calculations, and those given m the figure are quite useful for 
practical calculations. 
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TABLE 7.1.— GOOD CODES FOR VITERBI DECODING 


Constraint 

length, 

K 

Connection vectors 

r = 1/2; b - 1 

3 

111,101 

4 

ini, noi 

5 

11101, 10011 

6 

111101, 101011 

7 

1111001, 10100111 

8 

11111001, 10100111 

9 

111101011, 101110001 

it 

-Ci 

S 

II 

h. 

3 

111,111,101 

4 

mi, noi, ion 

5 

inn, noil, 10101 

6 

111101, 101011, 100111 

7 

1111001,1110101,1011011 

8 

11110111, 11011001, 10010101 


TABLE 7.2 — DERIVATIVE FUNCTIONS FOR CODES OF TABLE 7.1 


K 

jy T{x,y)\ y- 1, x- -J4p( 1 - p) 

1 1 

3 

*5 + 4*6+ 12x7 + 32** + 8Q* 9 + 192x'0 +448**' + 

1024x12 + 2304*13 + 

4 

2*6 + 7x7 + 18** + 49x 9 + 130x'0 + 333*1 1 + 836*12 + 
2069*' 3 + 506Q*1 4 + ... 

5 

4*7 + 12** + 20* 9 + 72*10 + 225*1 1 + 500*12 + I324*'3 + 
368Q*1 4 + 8967*15 + ... 

7 

36*10 +21 1*12 + 1404*14 + 1 1 633*1* + 76 628*'* + 
469 991*20 + ... 

I r=U3 1 

3 

3** + 15*1° + 58*12 + 201*14 + 655*16 + 2052*'* + .„ 

4 

6*10 + 6*12 + 58*14 + 118*16 + 507*'* + 1284*20 + 
4323*22 + ... 

5 

12*12 + 12*' 4 + 56*'6 + 32Q*1* + 693*20 + 2324*22 + 
8380*24 + ... 

6 

*13 + 8*14 + 26*15 + 20*16 + 19*' 7 + 62*1* + 86*' 9 + 
204*20 + 420*21 +710*22 + 1345*23 + ... 
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7.3 Sequential Decoding 

Sequential decoding decodes by searching through the tree to make the best guess. Unlike Viterbi, it does 
not necessarily have to perform the 1 K operations per decoded block. Sequential decoding steps quickly 
through the tree when r = v or nearly v. However, during noisy intervals it moves up and down the tree 
searching for the best candidate paths. 

73.1 ZJ or Stack Algorithm . . . 

In the ZJ or stack algorithm (fig. 7.8), metrics for each path are computed, stored, and bubble sorted in the 
stack at each step. Then, the most likely one is extended in all possible paths, and the metric is stored and 
recomputed. If the decoder is on the right (low error) path, the accumulated metric should grow steadily. If the 
path is incorrect, its metric will drop, and one of the lower paths in the stack will have its metnc bubbled to the 
top. The decoder goes back to the next candidate and starts a new search. In noisy conditions, the decoder can 
spend so much time moving back and forth that the input storage buffer overflows. This buffer overflow due to 
random search times is the practical limitation. 

Since paths of different length must be compared, a reasonable method of metnc computauon must be 
made. The most commonly used metric was introduced by Fano, thus the name Fano metric. It is 


M(n | v ( ) = log 2 " R 

where R = r is the code rate (note the problem in notation between code rate (R or r) and the received vector r 
and its components r,). The partial metric for the first i branches of a path v is 


M([ 


nl - 1 

7 i X log2 ^ 


1=0 




nl 

] 

;=o 


log 2 




Figure 7.8. — ZJ or stack algorithm. 
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The first term is the metric in Viterbi’s algorithm; the second term represents a positive bias that increases 
linearly with path length. Hence, longer paths have a larger bias than shorter ones, reflecting the fact that they 
are closer to the end of the tree and more likely to be a part of the most likely path. 

For the binary symmetric channel with transition probability p , the bit metrics are, from equation (7.1), 

M ( r i I v /) = lo g2 - R = l°g 2 2 P- R r i* v i 


M (n I V/) = log 2 -R = log 2 2(1 -p)-R n = V,- 

In the stack of the ZJ algorithm, an ordered list or stack of previously examined paths of different lengths is 
kept in storage. Each stack entry contains a path along with its metric values. The path with the largest metric 
is placed on top, and the others are listed in order of decreasing metric. Each decoding step consists of 
extending the top path in the stack by computing the branch metrics of its 2 k succeeding branches and then 
adding these to the metric of the top path to form 2 k new paths, called the successors of the top path. The top 
path is then deleted from the stack, its 2 k successors are inserted, and the stack is rearranged in order of 
decreasing metric values. When the top path in the stack is at the end of the tree, the algorithm terminates. 
Figure 7.8 summarizes the idea. 

73.2 Fano Algorithm 

Before discussing the famous Fano algorithm, let us first consider the concept of breakout nodes. The plot 
in figure 7.9 is the accumulated metric versus depth into the tree. Nodes on the most likely path, wherein the 
metric never falls below this metric value, are called breakout nodes. The significance of this is that the decoder 
never goes back farther than the last such node. Simulation to show the distribution of breakout nodes at the 
same metric value gives a qualitative measure of noise bursts on the channel, with accompanying delays. 

The stack algorithm can require large storage to remember all terminal nodes of the paths examined during 
the decoding process. The fact that the next path to be searched farther is always readily available in the stack 
is the main reason for the simplicity of the algorithm. However, this advantage is paid for by the large storage 
needed. The Fano algorithm, on the other hand, uses little storage and does not remember the metrics of 
previously searched paths. Therefore, in order to know whether the current path under consideration has the 
highest metric and should be explored farther, the decoder uses a set of threshold values to test the acceptability 
of the paths. As long as the current threshold value is satisfied (i.e., the currently explored path has a higher 
metric than the current threshold value), the decoder is assumed to be moving along an acceptable path and 
proceeds forward. However, if the current threshold is violated, the algorithm stops proceeding and goes into 


0 Nonbreakout nodes 
— -+ — Incorrect paths 

® Breakout nodes 

1 Correct path 



Figure 7.9. — Schematic path showing accumulation of metric 
and indicating breakout nodes. 
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Figure 7.10. — Fano algorithm. 


a search mode for a better path. Since no storage is used, the search for a better path is performed on a branch- 
per-branch basis. From its current node, the decoder retraces its way back and attempts to find another path that 
does not violate the current threshold value. The new path is then searched until trouble again occurs. The 
details of the rules that govern the motion of the decoder are best explained by the flowchart in figure 7. 1U. 
Note that the decoder has both a search loop and a forward loop. The rules are as follows: 

1. A particular node is said to satisfy any threshold smaller than or equal to its metric value and to violate 

any threshold larger than its metric value. . , , ...... 

2. Starting at value zero, the threshold T changes its value throughout the algorithm by multiples of 

increments A , a preselected constant. . 

3. The threshold is said to have been tightened when its value is increased by as many A increments as 

possible without being violated by the current node’s metric. 

4. The node being currently examined by the decoder is indicated by a search pointer. 

5. In the flow diagram, when a node is tested, the branch metric is computed and the total accumulated 
metric of that node is evaluated and compared with the threshold. A node may be tested in both a forward an 

a backward move. . , , , 

6. The decoder never moves its search pointer to a node that violates the current threshold. 

7. The threshold is tightened only when the search pointer moves to a node never before visited. 

Figure 7. 1 1 is a more detailed flowchart Finally, the received distance tree searched is shown schematically m 
figure 7.12. 
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7.4 Performance Characteristics for Sequential Decoding 

The number of computations required to decode one information bit has asymptotically a Pareto distribution, 
that is. 


P(C >N)< pN~ 


N» 1 


where C is the number of computations and a is the Pareto exponent. This relationship was found by < Gallager 
(1968) and verified through simulation. Here, a and /3 are functions of the channel transition probabilities and 
the code rate r. The code rate and exponent are related by 


; ._£q(«) 

a 


where 


E 0 (a) = a -log 2 


(l-p)i+« +p 


i_ J_ 

Ha 


\l+a 


is the Gallager function for the binary symmetric channel. The solution when tr = 1 yields r4 «o. the 
computational cutoff rate. In general, systems use 1 < o< 2. The value of *osets the upper limit on the code 
rate. For binary input and continuous output (very soft) case, 

/?0=l-log 2 (l + e- r£fr/ ^) 


and for the DMC/BSC 

R o = l-log 2 [l + 2Vp(l-P)] 


Then, 


-KRq / r 


Pbit ^ 


! — 2-lV-l} 2 


r<R o 


where K is the constraint length. 
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Chapter 8 

Summary 

Chapters 4 and 7 provide the formulas needed to plot bit error rate versus signal-to-noise ratio E\JN 0 for 
either block or convolutional codes. From these plots, the code gain at a prescribed bit error rate can be 
inferred. The real system issues of cost/complexity, robustness to bursts, etc., cannot be so neatly handled. 
Most block codes are decoded by using hard decisions; convolutional codes are often soft implementations. 
Since block decoders work on code words (blocks), the calculation of bit error rates is always approximate; 
chapter 4 covers this in detail. In many applications, block codes are used only for error detection. The 
calculation of bit error rates for convolutional codes is also somewhat vague, but for certain cases the results 
are rather crisp. 

The theoretical and practical limits of code rate are channel capacity C and computational cutoff rate /?o» 
respectively. For the binary symmetric channel, they are 


c = 1 + p log 2 P + (l-p) log 2 (1 - p) 


^0 = 1- log 2 


1+Vp(i-p)] 


For binary or quadrature phase shift keying with an analog decoder (infinitely soft) on the AWGN channel, 
they are 


f 


C = - log 2 


1 + 2 r 




it 

N„ 


R o = l-log 2 


f 

1 + exp 

v 



These require 


E b : 2 2 '-! 

N 0 2 r 


for r <C 


EL>A for r<Ro 

n T 
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Source 



Figure 8.1 . — Concatenated coding/decoding scheme along with 
interleaver/deinterieaver pair. 


Fill 



Empty 


Interleaver 


o 


■o 


Deinterleaver 


o 






O- 


(B-2)M 


y- 


Bursty 


To 

decoder 



Figure 8.2.— Block interleaver (a), fill registers by column read 
out by rows; and convolutional interleaver (b). 
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For convolutional codes 


BER ~ 2~ KR ° ,r 


Interleaving and concatenated codes are useful in that they break up large bursts as well as maximize the 
power and minimize the shortcomings of particular codes. Figure 8.1 shows an outer Reed-Solomon code with 
an inner convolutional one, as well as an interleaver for generality. The interleaver breaks up bursts into smaller 
ones that can be handled by the inner convolutional code. This inner decoder tends to make errors in bursts; 
then, the Reed-Solomon decoder can clean them up. 

Recall that a burst may be defined as follows: Let the burst length be t = mb and assume an m x n (row 
times column) interleaver. The interleaver breaks the burst into m smaller ones, each of length b. Recall that an 
(n,k) block code can correct a burst with length 


2 

Interleavers are used when a bursty channel exists (i.e., fading due to multipath and grain defects in magnetic 
storage). Viterbi decoders are better than sequential decoders on bursty channels, even though both are poor. 

Example 8*1 

Figure 8.2 shows two interleavers, both a block and a convolutional type. For block interleaving, the 
transmitter reads encoder output symbols into a memory by columns until it is full. Then, the memory is read 
out to the modulator by rows. While one memory is filling, another is being emptied, so that two are needed. 
At the receiver, the inverse operation is effected by reading the demodulator output into a memory by rows and 
reading the decoder input from the memory by columns; two memories are also needed. For the convolutional 
case, all multiplexers in the transmitter and receiver are operated synchronously. The multiplexer switches 
change position after each symbol time, so that successive encoder outputs enter different rows of the 
interleaver memory. Each interleaver and deinterleaver row is a shift register, which makes the implementation 
straightforward. 


▲ 
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Appendix A 

Q Function 


The Q function is defined as 



with the property 

Q(-x) = l-Q(x), fi(l 0) = ^ 


It is related to the error function by 



du 


Q (*) = 



where 

erfc(x) = 1 — erf(Ar) 

Observe that 


erfc(;t) = 2 
e rf(x) = l-2G(W2) 

The function is bounded as follows, where the bounds are close for x > 3: 


1 — yl C - j L- < Q(x) < — e ~ x ^ 2 

x 2 J x~j2n x42n 


125 




Appendix B 

Glossary 

A 

Adder, modulo-2: Same as exclusive-OR 



0 

1 

0 

0 

T 

1 

1 

0 


ADM: Adaptive delta modulation 
AGC: Automatic gain control 

Algorithm: Berlekamp, Euclid, Fano, Hartmann-Rudolph, Omura, stack, threshold, Viterbi 

ARQ: Automatic repeat request (used extensively in computers); a reverse channel is needed to alert the sender 
that the message was received in error, and a retransmission is needed. 

Asymptotic coding gain: Block, G ~ r(t + 1) ; convolutional, r ^ iee < G < rd {nc 

Augmented code: Adding new code words to an existing code 
AWGN: Additive white Gaussian noise 

B 

BCH: Bose-Chaudhuri-Hocquenghem 

BCH bound: Lower bound for minimum distance for such codes, d ^ > 2r + 1 , where 2t + 1 is the “design 
distance” 

BEC: Binary erasure channel 
BER: Bit error rate 

Bit: Measure of information; a “unit” used to measure information 
Binit: Binary digit 0 or 1 ; also called a “bit” 
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Bounded distance decoding: Algorithm wherein decoder corrects all errors of weight t or less but no others 
Bounds on distance for block codes: Upper and lower bounds for d m j n 
BPSK: Binary phase shift keying 
BSC: Binary symmetric channel 


1 -P 



Burst: A sequence wherein the first and last binit are in error but those in between may be correct (50/50), 
bursts often occur in groups (i.e., bursts of bursts). 


C 

Catastrophic error propagation: Some convolutional codes can generate infinite errors from a few in the 
“wrong place”; of academic interest only. 

CCIT: International Telegraph and Telephone Consultative Committee 
CDF: Column distance function; a measure used in convolutional codes 
Check bit: One of n-k redundant binits 

Chien search: Searching error locator polynomial o(x) to find its roots. The error location numbers are the 
reciprocals of these roots. 

Code rate: For blocks, kJrt\ for convolutional codes, kUn(L + m) 

Code word: One of 2 k words out of total 2 n n-tuples 

Coding gain: The difference in E^Nq between a coded and uncoded channel at a specified BER 
Computational cutoff rate: Maximum rate for sequential decoder to handle, Rq 
Concatenated codes: Two codes in tandem, “outer” and “inner” codes 

Constraint length (CL): Number of output bits that single input bit affects; there are many uses for this term. 
Coset: Row of code vectors in standard array 
CRC: Cyclic redundancy check 


128 



D 

DEQPSK: Differentially encoded quadrature phase shift keying 

Detection: Demodulator step to recover signal; coherent means the carrier and its phase are required, whereas 
incoherent does not use carrier phase information. 

Distance (designed): Minimum guaranteed distance, 2t + 1 

Distance (Euclidean): Normal vector length between points, d 2 = (p/ - S;) 2 , where pi and 5, are voltage levels 
in a demodulator 

Distance (free): Minimum-weight code word in a convolutional code, dfce, it can be of any length and is 
generated by a nonzero information sequence. 

Distance (Hamming): See Hamming distance. 

DM: Delta modulation 
DMC: Discrete memoryless channel 
DMS: Discrete memory less source 
DPCM: Differential pulse code modulation 
DPSK: Differential phase shift keying 

E 

Ej,IN 0 i Ratio of energy per bit to noise power level 

Erasure: An output of a demodulator that means no decision; neither zero or one can be chosen. 

Ergodic process: Each choice is independent of all previous ones. 

Error pattern: Vector added by channel noise, received vector = transmitted + error 

Expurgated code: Code with some code words discarded, often leaving remaining words with even weights 

Extended code: Code with more check bits added, chosen such that the weight structure is improved 


F 

Fano algorithm: Original tree search (sequential) algorithm; it moves quickly through the tree when errors are 
few; it slows proportionally with error rate and thus adapts to noise level, unlike Viterbi’s algorithm, which 
calculates 2 K values in each step. 

FEC: Forward-error correcting (only case considered herein) 

Field (Galois): See Galois field. 
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Fire codes: Burst-error-correcting codes 


Fractional rate loss: m!{L + m), for convolutional codes, where L denotes information bit length and 
m denotes memory order (the maximum length of all shift registers) 


G 

Galois Field: Finite field of elements where number of elements is prime number or a power of prime 
number; codes are developed so that the code words become elements of such fields, GF (q). 

GBC: General binary channel 
GCD: Greatest common divisor 

Golay code: Famous (23,12) perfect code whose properties have been studied exhaustively 

H 

Hamming codes: Defined by n = 2 m -l, n-k = m, where m = 1,2,3 dmin = 3, and t = 1 

Hamming distance: Number of positions by which two code words differ 

Hamming weight: Number of ones in a code word 

I 

Interleav ing : Block or convolutional; interleaving breaks up code blocks before transmission. After reception, 
the inverse reconstruction process tends to break up noise bursts added by the channel. Smaller bursts are 
more easily decoded. 

ISI: Intersymbol interference 

L 

LCM: Lowest common multiple 

Lengthened code: A code where parity check symbols have been added 

Linear code: Sum of two code words is also a code word. Most practical codes are linear. 

LPC: Linear product code 


M 

M-ary signaling: Modulator output is segmented into blocks of; bits. Then, each sequence is mapped into a 
waveform. There are M = 2 } such waveforms. For example, 

1001 — > ^(waveform) 

0111 — ^ ^2 (waveform) 
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Here, j = 4, so that M = 2 4 = 16 waveforms. 

Markov process: Each choice depends on previous ones but no farther back in the sequence of choices. 

Message: Ordering of letters and spaces on a page 

MFSK: Multiple-frequency shift keying 

MLD: Maximum- likelihood decoder 

MLSR: Maximum-length shift register codes 

MSK: Minimum shift keying 


Noise averaging: A method of understanding how codes work. The redundancy increases the uniqueness of 
words to help decoding, and noise averaging allows the receiver to average out the noise (by matched 
filtering) over long time spans, where T (word length) becomes very large. 


NASA code use: The Mariner probes (1969-76) used a Reed-Muller (2 m , m+ 1) code (32,6) and decoded by 
a finite-field transform algorithm (over GF(2 m )). Then from 1977 on, NASA switched to a convolutional 
code with K=1 constraint length and m = 6 memory. 


O 

ODP: Optimum distance profile, a code whose distance profile is best for a given constraint length 
OKQPSK: Offset-keyed quadrature phase shift keying 

OOK: On-off keying 


P 

Pareto distribution: Number of calculations C exceeding N in sequential decoder is given by this distribution: 

p(C> N) = aN~ a N » 1 

Perfect code: Code that corrects all patterns of t (or fewer) errors but no more. The Golay (23,12), Hamming 
( t = 1), and repetition ( n odd) codes are the only known perfect binary codes. 

Punctured code: Code where some parity check symbols have been deleted 

Q 

QPSK: Quadrature phase shift keying 

Quick look-in: A convolutional code wherein two sequences can be added to get information; there is thus no 
decoding step in the classical sense. 
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R 

r 0 : Computational cutoff rate; replaces channel capacity C in a practical sense. The decoder cannot operate 
properly if r > Ro (r - kin). 

Reed-Muller codes: Cyclic codes with overall parity check digit added 

Reed-Solomon (R-S) code: Very popular nonbinary block code that is good for bursts. Its structure is known 
and thus its error probabilities are calculable with good accuracy. 

Repetition: Code formed by repeating a given symbol a fixed number of times 
S 

SEC: Single-error correcting 

SECDED: Single-error-correcting, double-error-detecting code, an extended Hamming code (n.k) — » (n+\,k) 
(i.e., one added overall parity check is an example) 

SNR: Often, ratio of signal to noise energy per bit (symbol). SNR = n = a 2 ^, where E b is waveform 
energy and a is amplitude of received signal r(f) 

Source coding: Includes PCM, DPCM, DM (delta modulation), ADM (adaptive delta modulation), and LPC. 
In contrast, channel coding increases alphabet and cost, adds bandwidth, and needs a decoder. 

Source rate: R = HIT bits per message per second 

Syndrome: Sum of transmitted and locally generated checks at receiver 


Tail biting: A convolutional encoding scheme wherein a block of bits L+m long is processed as follows. e 
last m bits are fed into the encoder to initialize it, but the output is ignored. Then, L+m bits are encoded and 
transmitted. This eliminates the normal zero tail flush bits and gives the last m bits the same amount of 
coding protection that the L bits possess. 

Tree codes: Codes wherein encoder has memory. Convolutional codes are a subset of tree codes. 


U 

Undetected error: The case wherein the error vector is itself a code word, so that its sum with the message 
vector creates a valid code word. 

n 

Prob (Block error undetected) = £ Ajp J (1 - p) n 1 

;= i 

where Aj denotes weight distribution value. 
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V 


Viterbi: Very popular decoding algorithm for short-constraint- length (K < 10) convolutional codes; often 
= 7, r = 1/2. Works well when uncoded bit error rate is about 10" 4 or 10~ 5 . 


W 

Weight distribution: Knowing the sequence Aj, where A } is the number of code words with weight j, means 
knowing the code structure very well. 

Weight (Hamming): See Hamming weight. 


Z 

ZJ algorithm: Stack algorithm in sequential decoding. The most promising path in the tree has its 
accumulated metric bubbled to the top of the stack. 
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Appendix C 

Symbols 


A; number of code words with specific weight i 

a i ith digit in message vector 

b burst length 

C channel capacity 


Q code word, v m G 

C R leading coefficients 

Cj,C 2 ,C 3 checks 

D distance in Reed-Solomon codes 

^min minimum distance in Reed-Solomon codes 

d distance profile 

^free minimum- weight code word in convolutional code 

Hamming distance 

^miii distance between two closest code words 

E b energy per information bit 

E c energy per coded bit 

E m message energy 

E s energy in channel symbol 

E 0 (a) Gallager function 

e noise vector; error vector; error pattern 



e d 

f 

G 

G 

GF ip) 
gto 

H 

H{x) 

H(y) 

H T 

»s 

I 

K 

k 

L 

t 

M 

M b 

M f 

m 


assumed error vector 
number of errors corrected 
number of errors detected 

distribution function of system (particle density function) 

asymptotic gain 

code generator matrix 

Galois fields 

generator polynomial 

parity check matrix 

average self-information or self-entropy in any message symbol 
entropy of output channel 
transpose of parity check matrix 

entropy of code 
quantity of information 
constraint length 

number of input bits; number of alphabetic symbols: Boltzmann’s constant 
information bit length 

number of errors; length of shortest burst in code word 
path metric; movement parameter 
backward-movement parameter in Fano algorithm 
forward-movement parameter in Fano algorithm 

memory order; number of slots per page; number of terms; number of message symbols 


m message vector 


m output 

N average noise power; number of permutations; number of particles; number of equally likely 

messages 

N c number of code words 

N e number of n-tuples 
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N 0 

n 

n A 

P 

P 

m 

Pb 

Pb 

(Pbh 

Pb\t 

Pc 

Pe 

Pi 

Pm 

(Pm^c 

(Pm\ 

P symbol 
Pu 

Pu(Z) 

Pw 

Q 

R 

R max 

*s 


noise power spectral density 
number of output bits 
constraint span 

received power in modulated signal 

probability; number of erasures corrected; transition probability 

probability of decoding error 

probability that block is decoded incorrectly 

probability of received bit error 

resulting bit error rate into data sink 

probability of bit error 

probability of bit error when coding is applied; coded channel bit 

probability of decoder error 

probability of any symbol occurring, 1/m 

probability of message error 

block error rate for coding 

probability that uncoded block message will be received in error 

channel symbol error rate 

uncoded channel bit 

probability of undetected error 

probability of word error 

heat; number of source symbols 

bit rate; code rate; data rate or information symbol rate 

maximum information symbol rate 

symbol rate; channel symbol rate; chip rate; baud; etc. 

computational cutoff rate 

code rate 

received vector; volume of real space 
components of received vector 
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r 0 

S 

S. 

*i 

T 

T(x) 


v r v y v z 


W 


computational cutoff rate (in sequential decoding) 
entropy ; signal power 
n-k vector; syndrome 

partial syndrome; voltage level in demodulator 
threshold; time; Nyquist rate for no ISI, 1/2 W 
generating function 
duration of code word 

time; number of correctable errors in received word 

transmitted waveform 

assumed input vector 

code vector; code vector sequence 

input vector in convolutional code 

received waveform 

code vector 

message vector 

output vector in convolutional code; volume of velocity space; transmitted code word 
velocity components 

thermodynamic probability of state of interest; bandwidth 


x number of erasures 

x symbol emitted by source 

x i message symbol 

x,y,z coordinates 

y received sequence 

z Hamming distance between r and v 

Z received vector 

a event; number of information symbol errors per word error; symbol; Pareto exponent; 

amplitude of received signal 
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c£ error number locations 

ft event; general element 

/3- number of remaining errors in decoded block 

y i number of coset leaders of weight i 

p i voltage level in demodulator 

<p( x) smallest degree polynomial 
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